Journal Article
C. Wang, J. Mattingly, and Y. M. Lu, “Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA,” Submitted. arXiv:1712.04332 [cs.LG]Abstract
We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target feature vector and its estimates provided by the algorithms will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE can be efficiently obtained. These solutions lead to precise predictions of the performance of the algorithms, as many practical performance metrics are linear functionals of the joint empirical measures. In addition to characterizing the dynamic performance of online learning algorithms, our asymptotic analysis also provides useful insights. In particular, in the high-dimensional limit, and due to exchangeability, the original coupled dynamics associated with the algorithms will be asymptotically ``decoupled'', with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight for nonconvex optimization problems may prove an interesting line of future research.
A. Agaskar and Y. M. Lu, “Optimal Detection of Random Walks on Graphs: Performance Analysis via Statistical Physics,” Submitted. arXiv:1504.06924Abstract

We study the problem of detecting a random walk on a graph from a sequence of noisy measurements at every node. There are two hypotheses: either every observation is just meaningless zero-mean Gaussian noise, or at each time step exactly one node has an elevated mean, with its location following a random walk on the graph over time. We want to exploit knowledge of the graph structure and random walk parameters (specified by a Markov chain transition matrix) to detect a possibly very weak signal. The optimal detector is easily derived, and we focus on the harder problem of characterizing its performance through the (type-II) error exponent: the decay rate of the miss probability under a false alarm constraint.
The expression for the error exponent resembles the free energy of a spin glass in statistical physics, and we borrow techniques from that field to develop a lower bound. Our fully rigorous analysis uses large deviations theory to show that the lower bound exhibits a phase transition: strong performance is only guaranteed when the signal-to-noise ratio exceeds twice the entropy rate of the random walk.
Monte Carlo simulations show that the lower bound fully captures the behavior of the true exponent.

Y. M. Lu and G. Li, “Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation,” Information and Inference, Forthcoming. arXiv:1702.06435 [cs.IT]Abstract

We study a spectral initialization method that serves as a key ingredient in recent work on using efficient iterative algorithms for estimating signals in nonconvex settings. Unlike previous analysis in the literature, which is restricted to the phase retrieval setting and which provides only performance bounds, we consider arbitrary generalized linear sensing models and present a precise asymptotic characterization of the performance of the spectral method in the high-dimensional regime. Our analysis reveals a phase transition phenomenon that depends on the sampling ratio. When the ratio is below a minimum threshold, the estimates given by the spectral method are no better than a random guess drawn uniformly from the hypersphere; above a maximum threshold, however, the estimates become increasingly aligned with the target signal. The computational complexity of the spectral method is also markedly different in the two phases. Worked examples and numerical results are provided to illustrate and verify the analytical predictions. In particular, simulations show that our asymptotic formulas provide accurate predictions even at moderate signal dimensions.

Y. M. Lu, J. Oñativia, and P. L. Dragotti, “Sparse Representation in Fourier and Local Bases Using ProSparse: A Probabilistic Analysis,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2639-2647, 2018. arXiv:1611.07971 [cs.IT]Abstract

Finding the sparse representation of a signal in an overcomplete dictionary has attracted a lot of attention over the past years. This paper studies ProSparse, a new polynomial complexity algorithm that solves the sparse representation problem when the underlying dictionary is the union of a Vandermonde matrix and a banded matrix. Unlike our previous work which establishes deterministic (worst-case) sparsity bounds for ProSparse to succeed, this paper presents a probabilistic average-case analysis of the algorithm. Based on a generating-function approach, closed-form expressions for the exact success probabilities of ProSparse are given. The success probabilities are also analyzed in the high-dimensional regime. This asymptotic analysis characterizes a sharp phase transition phenomenon regarding the performance of the algorithm.

A. Minot, Y. M. Lu, and N. Li, “A Distributed Gauss-Newton Method for Power System State Estimation,” IEEE Transactions on Power Systems, vol. 31, no. 5, pp. 3804-3815, 2016.Abstract

We propose a fully distributed Gauss-Newton algorithm for state estimation of electric power systems. At each Gauss-Newton iteration, matrix-splitting techniques are utilized to carry out the matrix inversion needed for calculating the Gauss-Newton step in a distributed fashion. In order to reduce the communication burden as well as increase robustness of state estimation, the proposed distributed scheme relies only on local information and a limited amount of information from neighboring areas. The matrix-splitting scheme is designed to calculate the Gauss-Newton step with exponential convergence speed. The effectiveness of the method is demonstrated in various numerical experiments.

I. Dokmanic and Y. M. Lu, “Sampling Sparse Signals on the Sphere: Algorithms and Applications,” IEEE Transactions on Signal Processing, vol. 64, no. 1, pp. 189-202, 2016. arXiv:1502.07577Abstract

We propose a sampling scheme that can perfectly reconstruct a collection of
spikes on the sphere from samples of their lowpass-filtered observations.
Central to our algorithm is a generalization of the annihilating filter
method, a tool widely used in array signal processing and finite-rate-of-innovation
(FRI) sampling. The proposed algorithm can reconstruct $K$ spikes
from $(K+\sqrt{K})^2$ spatial samples---a sampling requirement that
improves over known sparse sampling schemes on the sphere by a factor of up
to four.

We showcase the versatility of the proposed algorithm by applying it to
three different problems: 1) sampling diffusion processes induced by
localized sources on the sphere, 2) shot-noise removal, and 3) sound source
localization (SSL) by a spherical microphone array. In particular, we show
how SSL can be reformulated as a spherical sparse sampling problem.

R. Yin, T. Gao, Y. M. Lu, and I. Daubechies, “A Tale of Two Bases: Local-Nonlocal Regularization on Image Patches with Convolution Framelets,” SIAM Journal on Imaging Sciences, vol. 10, no. 2, pp. 711-750, 2017.Abstract

We propose an image representation scheme combining the local and nonlocal characterization of patches in an image. Our representation scheme can be shown to be equivalent to a tight frame constructed from convolving local bases (e.g., wavelet frames, discrete cosine transforms, etc.) with nonlocal bases (e.g., spectral basis induced by nonlinear dimension reduction on patches), and we call the resulting frame elements convolution framelets. Insight gained from analyzing the proposed representation leads to a novel interpretation of a recent high-performance patch-based image pro- cessing algorithm using the point integral method (PIM) and the low dimensional manifold model (LDMM) [S. Osher, Z. Shi, and W. Zhu, Low Dimensional Manifold Model for Image Processing, Tech. Rep., CAM report 16-04, UCLA, Los Angeles, CA, 2016]. In particular, we show that LDMM is a weighted l2-regularization on the coefficients obtained by decomposing images into linear combinations of convolution framelets; based on this understanding, we extend the original LDMM to a reweighted version that yields further improved results. In addition, we establish the energy concentration property of convolution framelet coefficients for the setting where the local basis is constructed from a given nonlocal basis via a linear reconstruction framework; a generalization of this framework to unions of local embeddings can provide a natural setting for interpreting BM3D, one of the state-of-the-art image denoising algorithms. 

S. H. Chan, T. Zickler, and Y. M. Lu, “Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5107-5121, 2017. arXiv:1601.00088Abstract

Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such performance gain is related to the Sinkhorn-Knopp balancing algorithm, an iterative procedure that symmetrizes a row-stochastic matrix to a doubly-stochastic matrix. However, a complete understanding of the performance gain phenomenon is still lacking.

In this paper, we study the performance gain phenomenon from a statistical learning perspective. We show that Sinkhorn-Knopp is equivalent to an Expectation-Maximization (EM) algorithm of learning a Product of Gaussians (PoG) prior of the image patches. By establishing the correspondence between the steps of Sinkhorn-Knopp and the EM algorithm, we provide a geometrical interpretation of the symmetrization process. The new PoG model also allows us to develop a new denoising algorithm called Product of Gaussian Non-Local-Means (PoG-NLM). PoG-NLM is an extension of the Sinkhorn-Knopp and is a generalization of the classical non-local means. Despite its simple formulation, PoG-NLM outperforms many existing smoothing filters and has a similar performance compared to BM3D.

Y. Chi and Y. M. Lu, “Kaczmarz Method for Solving Quadratic Equations,” IEEE Signal Processing Letters, vol. 23, no. 9, pp. 1183-1187, 2016.Abstract

Estimating low-rank positive-semidefinite (PSD) matrices from symmetric rank-one measurements is of great importance in many applications, such as high-dimensional data processing, quantum state tomography, and phase retrieval. When the rank is known a priori, this problem can be regarded as solving a system of quadratic equations of a low-dimensional subspace. The authors develop a fast iterative algorithm based on an adaptation of the Kaczmarz method, which is traditionally used for solving overdetermined linear systems. In particular, the authors characterize the dynamics of the algorithm when the measurement vectors are composed of standard Gaussian entries in the online setting. Numerical simulations demonstrate the compelling performance of the proposed algorithm.

F. Sroubek, J. Kamenicky, and Y. M. Lu, “Decomposition space-variant blur in image deconvolution,” IEEE Signal Processing Letters, vol. 23, no. 3, pp. 346-350, 2016.Abstract

Standard convolution as a model of radiometric degradation is in majority of cases inaccurate as the blur varies in space and we are thus required to work with a computationally demanding space-variant model. Space-variant degradation can be approximately decomposed to a set of standard convolutions. We explain in detail the properties of the space-variant degrada- tion operator and show two possible decomposition models and two approximation approaches. Our target application is space- variant image deconvolution, on which we illustrate theoretical differences between these models. We propose a computationally efficient restoration algorithm that belongs to a category of alternating direction methods of multipliers, which consists of four update steps with closed-form solutions. Depending on the used decomposition, two variations of the algorithm exist with distinct properties. We test the effectiveness of the decomposition models under different levels of approximation on synthetic and real examples, and conclude the letter by drawing several practical observations. 

D. M. Merfeld, T. K. Clark, Y. M. Lu, and F. Karmali, “Dynamics of Individual Perceptual Decisions,” Journal of Neurophysiology, vol. 115, no. 1, 2016. Publisher's VersionAbstract

Perceptual decision-making is fundamental to a broad range of fields including neurophysiology, economics, medicine, advertising, law, etc. While recent findings have yielded major advances in our understanding of perceptual decision-making, decision-making as a function of time and frequency (i.e., decision-making dynamics) is not well understood. To limit the review length, we focus most of this review on human findings. Animal findings, which are extensively reviewed elsewhere, are included when beneficial or necessary. We attempt to put these various findings and datasets - which can appear to be unrelated in the absence of a formal dynamic analysis - into context using published models. Specifically, by adding appropriate dynamic mechanisms (e.g., high-pass filters) to existing models, it appears that a number of otherwise seemingly disparate findings from the literature might be explained. One hypothesis that arises through this dynamic analysis is that decision-making includes phasic (high-pass) neural mechanisms, an evidence accumulator and/or some sort of mid-trial decision-making mechanism (e.g., peak detector and/or decision boundary).

C. Hu, J. Sepulcre, K. A. Johnson, G. E. Fakhri, Y. M. Lu, and Q. Li, “Matched Signal Detection on Graphs: Theory and Application to Brain Imaging Data Classification,” NeuroImage, vol. 125, pp. 587-600, 2016.Abstract

Motivated by recent progresses in signal processing on graphs, we have developed a matched signal detection (MSD) theory for signals with intrinsic structures described by weighted graphs. First, we regard graph Laplacian eigenvalues as frequencies of graph-signals and assume that the signal is in a subspace spanned by the first few graph Laplacian eigenvectors associated with lower eigenvalues. The conventional matched subspace detector can be applied to this case. Furthermore, we study signals that may not merely live in a subspace. Namely, we consider signals with bounded variation on graphs and more general signals that are randomly drawn from a prior distribution. For bounded variation signals, the test is a weighted energy detector. For the random signals, the test statistic is the difference of signal variations on associated graphs, if a degenerate Gaussian distribution specified by the graph Laplacian is adopted. We evaluate the effectiveness of the MSD on graphs both on simulated and real data sets. Specifically, we apply MSD to the brain imaging data classification problem of Alzheimer’s disease (AD) based on two independent data sets: 1) positron emission tomography data with Pittsburgh compound-B tracer of 30 AD and 40 normal control (NC) subjects, 2) resting-state functional magnetic resonance imaging (R-fMRI) data of 30 early mild cognitive impairment and 20 NC subjects. Our results demonstrate that the MSD approach is able to outperform the traditional methods and help detect AD at an early stage, probably due to the success of exploiting the manifold structure of the data. 

C. Hu, et al., “A Spectral Graph Regression Model for Learning Brain Connectivity of Alzheimer's Disease,” PLOS ONE, vol. 10, no. 5, 2015. Publisher's VersionAbstract

Understanding network features of brain pathology is essential to reveal underpinnings of neurodegenerative diseases. In this paper, we introduce a novel graph regression model (GRM) for learning structural brain connectivity of Alzheimer’s disease (AD) measured by amyloid-β  deposits. The proposed GRM regards 11 C-labeled Pittsburgh Compound-B (PiB) positron emission tomography (PET) imaging data as smooth signals defined on an unknown graph. This graph is then estimated through an optimization framework, which fits the graph to the data with an adjustable level of uniformity of the connection weights. Under the assumed data model, results based on simulated data illustrate that our approach can accurately reconstruct the underlying network, often with better reconstruction than those obtained by both sample correlation and ℓ1 -regularized partial correlation estimation. Evaluations performed upon PiB-PET imaging data of 30 AD and 40 elderly normal control (NC) subjects demonstrate that the connectivity patterns revealed by the GRM are easy to interpret and consistent with known pathology. Moreover, the hubs of the reconstructed networks match the cortical hubs given by functional MRI. The discriminative network features including both global connectivity measurements and degree statistics of specific nodes discovered from the AD and NC amyloid-beta networks provide new potential biomarkers for preclinical and clinical AD.

S. H. Chan, T. Zickler, and Y. M. Lu, “Monte Carlo Non-Local Means: Random Sampling for Large-Scale Image Filtering,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3711-3725, 2014.Abstract

We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorithm and show that, for large images or large external image databases, the random outcomes of MCNLM are tightly concentrated around the deterministic full NLM result. In particular, our error probability bounds show that, at any given sampling ratio, the probability for MCNLM to have a large deviation from the original NLM solution decays exponentially as the size of the image or database grows. Second, we derive explicit formulas for optimal sampling patterns that minimize the error probability bound by exploiting partial knowledge of the pairwise similarity weights. Numerical experiments show that MCNLM is competitive with other state-of-the-art fast NLM algorithms for single-image denoising. When applied to denoising images using an external database containing ten billion patches, MCNLM returns a randomized solution that is within 0.2 dB of the full NLM solution while reducing the runtime by three orders of magnitude.

P. L. Dragotti and Y. M. Lu, “On Sparse Representation in Fourier and Local Bases,” IEEE Transactions on Information Theory, vol. 60, no. 12, pp. 7888-7899, 2014.Abstract

We consider the classical problem of finding the sparse representation of a signal in a pair of bases. When both bases are orthogonal, it is known that the sparse representation is unique when the sparsity $K$ of the signal satisfies $K<1/\mu(\mD)$, where $\mu(\mD)$ is the mutual coherence of the dictionary. Furthermore, the sparse representation can be obtained in polynomial time by Basis Pursuit (BP), when $K<0.91/\mu(\mD)$. Therefore, there is a gap between the unicity condition and the one required to use the polynomial-complexity BP formulation. For the case of general dictionaries, it is also well known that finding the sparse representation under the only constraint of unicity is NP-hard. In this paper, we introduce, for the case of Fourier and canonical bases, a polynomial complexity algorithm that finds all the possible $K$-sparse representations of a signal under the weaker condition that $K<\sqrt{2} /\mu(\mD)$. Consequently, when $K<1/\mu(\mD)$, the proposed algorithm solves the unique sparse representation problem for this structured dictionary in polynomial time. We further show that the same method can be extended to many other pairs of bases, one of which must have local atoms. Examples include the union of Fourier and local Fourier bases, the union of discrete cosine transform and canonical bases, and the union of random Gaussian and canonical bases.

S. Maranò, D. Fäh, and Y. M. Lu, “Sensor Placement for the Analysis of Seismic Surface Waves: Source of Error, Design Criterion, and Array Design Algorithms,” Geophys. J. Int., vol. 197, no. 3, pp. 1566-1581, 2014. Publisher's VersionAbstract

Seismic surface waves can be measured by deploying an array of seismometers on the surface of the earth. The goal of such measurement surveys is, usually, to estimate the velocity of propagation and the direction of arrival of the seismic waves. In this paper, we address the issue of sensor placement for the analysis of seismic surface waves from ambient vibration wavefields. First, we explain in detail how the array geometry affects the mean-squared estimation error (MSEE) of parameters of interest, such as the velocity and direction of propagation, both at low and high signal-to-noise ratios (SNRs). Second, we propose a cost function suitable for the design of the array geometry with particular focus on the estimation of the wavenumber of both Love and Rayleigh waves. Third, we present and compare several computational approaches to minimize the proposed cost function. Numerical experiments verify the effectiveness of our cost function and resulting array geometry designs, leading to greatly improved estimation performance in comparison to arbitrary array geometries, both at low and high SNR levels.

I. Dokmanic, R. Parhizkar, A. Walther, Y. M. Lu, and M. Vetterli, “Acoustic Echoes Reveal Room Shape,” Proceedings of the National Academy of Sciences (PNAS), vol. 110, no. 30, pp. 12186-12191, 2013. Full Text (PDF + Supplementary Info)Abstract
Imagine that you are blindfolded inside an unknown room. You snap your fingers and listen to the room’s response. Can you hear the shape of the room? Some people can do it naturally, but can we design computer algorithms that hear rooms? We show how to compute the shape of a convex polyhedral room from its re- sponse to a known sound, recorded by a few microphones. Geo- metric relationships between the arrival times of echoes enable us to “blindfoldedly” estimate the room geometry. This is achieved by exploiting the properties of Euclidean distance matrices. Fur- thermore, we show that under mild conditions, first-order echoes provide a unique description of convex polyhedral rooms. Our algorithm starts from the recorded impulse responses and pro- ceeds by learning the correct assignment of echoes to walls. In contrast to earlier methods, the proposed algorithm reconstructs the full 3D geometry of the room from a single sound emission, and with an arbitrary geometry of the microphone array. As long as the microphones can hear the echoes, we can position them as we want. Besides answering a basic question about the inverse problem of room acoustics, our results find applications in areas such as architectural acoustics, indoor localization, virtual reality, and audio forensics.
A. Agaskar and Y. M. Lu, “A Spectral Graph Uncertainty Principle,” IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4338-4356, 2013.Abstract

The spectral theory of graphs provides a bridge between classical signal processing and the nascent field of graph signal processing. In this paper, a spectral graph analogy to Heisenberg's celebrated uncertainty principle is developed. Just as the classical result provides a tradeoff between signal localization in time and frequency, this result provides a fundamental tradeoff between a signal's localization on a graph and in its spectral domain. Using the eigenvectors of the graph Laplacian as a surrogate Fourier basis, quantitative definitions of graph and spectral ``spreads'' are given, and a complete characterization of the feasibility region of these two quantities is developed. In particular, the lower boundary of the region, referred to as the uncertainty curve, is shown to be achieved by eigenvectors associated with the smallest eigenvalues of an affine family of matrices. The convexity of the uncertainty curve allows it to be found to within $\varepsilon$ by a fast approximation algorithm requiring $\mathcal{O}(\varepsilon^{-1/2})$ typically sparse eigenvalue evaluations. Closed-form expressions for the uncertainty curves for some special classes of graphs are derived, and an accurate analytical approximation for the expected uncertainty curve of Erdos-Renyi random graphs is developed. These theoretical results are validated by numerical experiments, which also reveal an intriguing connection between diffusion processes on graphs and the uncertainty bounds.

M. N. Do and Y. M. Lu, “Multidimensional Filter Banks and Multiscale Geometric Representations,” Foundation and Trends in Signal Processing, vol. 5, no. 3, pp. 157-264, 2012.Abstract

Thanks to the explosive growth of sensing devices and capabilities, multidimensional (MD) signals — such as images, videos, multispectral images, light fields, and biomedical data volumes — have become ubiquitous.
Multidimensional filter banks and the associated constructions provide a unified framework and an efficient computational tool in the formation, representation, and processing of these multidimensional data sets. In this survey we aim to provide a systematic development of the theory and constructions of multidimensional filter banks. We thoroughly review several tools that have been shown to be particularly effective in the design and analysis of multidimensional filter banks, including sampling lattices, multidimensional bases and frames, polyphase representations, Gröbner bases, mapping methods, frequency domain constructions, ladder structures and lifting schemes. We then focus on the construction of filter banks and signal representations that can capture directional and geometric features, which are unique and key properties of many multidimensional signals. Next, we study the connection between iterated multidimensional filter banks in the discrete domain and the associated multiscale signal representations in the continuous domain through a directional multiresolution analysis framework. Finally, we show several examples to demonstrate the power of multidimensional filter banks and geometric signal representations in applications.

F. Yang, Y. M. Lu, L. Sbaiz, and M. Vetterli, “Bits from Photons: Oversampled Image Acquisition Using Binary Poisson Statistics,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1421-1436, 2012. Extended Version with Complete ProofAbstract

We study a new image sensor that is reminiscent of traditional photographic film. Each pixel in the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. To analyze its performance, we formulate the oversampled binary sensing scheme as a parameter estimation problem based on quantized Poisson statistics. We show that, with a single-photon quantization threshold and large oversampling factors, the Cramér-Rao lower bound (CRLB) of the estimation variance approaches that of an ideal unquantized sensor, that is, as if there were no quantization in the sensor measurements. Furthermore, the CRLB is shown to be asymptotically achievable by the maximum likelihood estimator (MLE). By showing that the log-likelihood function of our problem is concave, we guarantee the global optimality of iterative algorithms in finding the MLE. Numerical results on both synthetic data and images taken by a prototype sensor verify our theoretical analysis and demonstrate the effectiveness of our image reconstruction algorithm. They also suggest the potential application of the oversampled binary sensing scheme in high dynamic range photography.