C. Wang, Y. C. Eldar, and Y. M. Lu, “Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 6, 2018. arXiv:1805.06834 [cs.LG]Abstract
We present a high-dimensional analysis of three popular algorithms, namely, Oja's method, GROUSE and PETRELS, for subspace estimation from streaming and highly incomplete observations.  We show that, with proper time scaling, the time-varying principal angles between the true subspace and its estimates given by the algorithms converge weakly to deterministic processes when the ambient dimension \(n\) tends to infinity. Moreover, the limiting processes can be exactly characterized as the unique solutions of certain ordinary differential equations (ODEs). A finite sample bound is also given, showing that the rate of convergence towards such limits is \(\mathcal{O}(1/\sqrt{n})\). In addition to providing asymptotically exact predictions of the dynamic performance of the algorithms, our high-dimensional analysis yields several insights, including an asymptotic equivalence between Oja's method and GROUSE, and a precise scaling relationship linking the amount of missing data to the signal-to-noise ratio. By analyzing the solutions of the limiting ODEs, we also establish phase transition phenomena associated with the steady-state performance of these techniques.
L. Balzano, Y. Chi, and Y. M. Lu, “A Modern Perspective on Streaming PCA and Subspace Tracking: The Missing Data Case,” Proceedings of the IEEE, vol. 106, no. 8, pp. 1293-1310, 2018.Abstract
For many modern applications in science and engineering, data are collected in a streaming fashion carrying time-varying information, and practitioners need to process them with a limited amount of memory and computational resources in a timely manner for decision making. This often is coupled with the missing data problem, such that only a small fraction of data attributes are observed. These complications impose significant, and unconventional, constraints on the problem of streaming Principal Component Analysis (PCA) and subspace tracking, which is an essential building block for many inference  tasks in signal processing and machine learning. This survey article reviews a variety of classical and recent algorithms for solving this problem with low computational and memory complexities, particularly those applicable in the big data regime with missing data. We illustrate that streaming PCA and subspace tracking algorithms can be understood through algebraic and geometric perspectives and they need to be adjusted carefully to handle missing data. Both asymptotic and non-asymptotic convergence guarantees are reviewed. Finally, we benchmark the performance of several competitive algorithms in the presence of missing data for both well-conditioned and ill-conditioned systems.
Y. M. Lu, J. Oñativia, and P. L. Dragotti, “Sparse Representation in Fourier and Local Bases Using ProSparse: A Probabilistic Analysis,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2639-2647, 2018. arXiv:1611.07971 [cs.IT]Abstract

Finding the sparse representation of a signal in an overcomplete dictionary has attracted a lot of attention over the past years. This paper studies ProSparse, a new polynomial complexity algorithm that solves the sparse representation problem when the underlying dictionary is the union of a Vandermonde matrix and a banded matrix. Unlike our previous work which establishes deterministic (worst-case) sparsity bounds for ProSparse to succeed, this paper presents a probabilistic average-case analysis of the algorithm. Based on a generating-function approach, closed-form expressions for the exact success probabilities of ProSparse are given. The success probabilities are also analyzed in the high-dimensional regime. This asymptotic analysis characterizes a sharp phase transition phenomenon regarding the performance of the algorithm.

C. Wang, J. Mattingly, and Y. M. Lu, “Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA,” Technical Report, 2017. arXiv:1712.04332 [cs.LG]Abstract
We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target feature vector and its estimates provided by the algorithms will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE can be efficiently obtained. These solutions lead to precise predictions of the performance of the algorithms, as many practical performance metrics are linear functionals of the joint empirical measures. In addition to characterizing the dynamic performance of online learning algorithms, our asymptotic analysis also provides useful insights. In particular, in the high-dimensional limit, and due to exchangeability, the original coupled dynamics associated with the algorithms will be asymptotically ``decoupled'', with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight for nonconvex optimization problems may prove an interesting line of future research.
O. Dhifallah, C. Thrampoulidis, and Y. M. Lu, “Phase Retrieval via Linear Programming: Fundamental Limits and Algorithmic Improvements,” in 55th Annual Allerton Conference on Communication, Control, and Computing, 2017. arXiv:1710.05234 [cs.IT]Abstract
A recently proposed convex formulation of the phase retrieval problem estimates the unknown signal by solving a simple linear program. This new scheme, known as PhaseMax, is computationally efficient compared to standard convex relaxation methods based on lifting techniques. In this paper, we present an exact performance analysis of PhaseMax under Gaussian measurements in the large system limit. In contrast to previously known performance bounds in the literature, our results are asymptotically exact and they also reveal a sharp phase transition phenomenon. Furthermore, the geometrical insights gained from our analysis led us to a novel nonconvex formulation of the phase retrieval problem and an accompanying iterative algorithm based on successive linearization and maximization over a polytope. This new algorithm, which we call PhaseLamp, has provably superior recovery performance over the original PhaseMax method.
C. Wang and Y. M. Lu, “The Scaling Limit of High-Dimensional Online Independent Component Analysis,” in Conference on Neural Information Processing Systems (NIPS), 2017.Abstract

We analyze the dynamics of an online algorithm for independent component analysis in the high-dimensional scaling limit. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measure of the target feature vector and the estimates provided by the algorithm will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE, which involves two spatial variables and one time variable, can be efficiently obtained. These solutions provide detailed information about the performance of the ICA algorithm, as many practical performance metrics are functionals of the joint empirical measures. Numerical simulations show that our asymptotic analysis is accurate even for moderate dimensions. In addition to providing a tool for understanding the performance of the algorithm, our PDE analysis also provides useful insight. In particular, in the high-dimensional limit, the original coupled dynamics associated with the algorithm will be asymptotically “decoupled”, with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight to design new algorithms for achieving optimal trade-offs between computational and statistical efficiency may prove an interesting line of future research. 

Spotlight paper (acceptance rate: 112/3240 = 3.5%)
O. Dhifallah and Y. M. Lu, “Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis,” in the 7th IEEE Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2017. arXiv:1708.03355Abstract

We consider a recently proposed convex formulation, known as the PhaseMax method, for solving the phase retrieval problem. Using the replica method from statistical mechanics, we analyze the performance of PhaseMax in the high-dimensional limit. Our analysis predicts the exact asymptotic performance of PhaseMax. In particular, we show that a sharp phase transi- tion phenomenon takes place, with a simple analytical formula characterizing the phase transition boundary. This result shows that the oversampling ratio required by existing performance bounds in the literature can be significantly reduced. Numerical results confirm the validity of our replica analysis, showing that the theoretical predictions are in excellent agreement with the actual performance of the algorithm, even for moderate signal dimensions. 

This paper won the Best Student Paper Award (First Prize) at the 2017 IEEE CAMSAP Workshop.

The predictions made in this paper via the non-rigorous replica method has since been rigorously established in our latest work.

Y. M. Lu and G. Li, “Spectral Initialization for Nonconvex Estimation: High-Dimensional Limit and Phase Transitions,” in IEEE International Symposium on Information Theory (ISIT), 2017.Abstract

We study a simple spectral method that serves as a key ingredient in a growing line of work using efficient iterative algorithms for estimating signals in nonconvex settings. Unlike previous work, which focuses on the phase retrieval setting and provides only bounds on the performance, we consider arbitrary generalized linear sensing models and provide an exact characterization of the performance of the spectral method in the high-dimensional regime. Our analysis reveals a phase transition phenomenon that depends on the sampling ratio. When the ratio is below a critical threshold, the estimates given by the spectral method are no better than random guesses drawn uniformly from the hypersphere; above the threshold, however, the estimates become increasingly aligned with the underlying signal. Worked examples and numerical simulations are provided to illustrate and verify the analytical predictions. 

C. Wang, Y. Eldar, and Y. M. Lu, “Subspace Estimation from Incomplete Observations: A Precise High-Dimensional Analysis,” in Signal Processing with Adaptive Structured Representatives (SPARS) Workshop, 2017.Abstract

The problem of estimating and tracking low-rank subspaces from incomplete observations has received a lot of attention recently in the signal processing and learning communities. Popular algorithms, such as GROUSE and PETRELS, are often very effective in practice, but their performance depends on the careful choice of algorithmic parameters. Important questions, such as the global convergence of these algorithms and how the noise level, subsampling ratio, and various other parameters affect the performance, are not fully understood. In this paper, we present a precise analysis of the performance of these algorithms in the asymptotic regime where the ambient dimension tends to infinity. Specifically, we show that the time-varying trajectories of estimation errors converge weakly to a deterministic function of time, which is characterized as the unique solution of a system of ordinary differential equations (ODEs.) Analyzing the limiting ODEs also reveals and characterizes sharp phase transition phenomena associated with these algorithms. Numerical simulations verify the accuracy of our asymptotic predictions, even for moderate signal dimensions. 

R. Yin, T. Gao, Y. M. Lu, and I. Daubechies, “A Tale of Two Bases: Local-Nonlocal Regularization on Image Patches with Convolution Framelets,” SIAM Journal on Imaging Sciences, vol. 10, no. 2, pp. 711-750, 2017.Abstract

We propose an image representation scheme combining the local and nonlocal characterization of patches in an image. Our representation scheme can be shown to be equivalent to a tight frame constructed from convolving local bases (e.g., wavelet frames, discrete cosine transforms, etc.) with nonlocal bases (e.g., spectral basis induced by nonlinear dimension reduction on patches), and we call the resulting frame elements convolution framelets. Insight gained from analyzing the proposed representation leads to a novel interpretation of a recent high-performance patch-based image pro- cessing algorithm using the point integral method (PIM) and the low dimensional manifold model (LDMM) [S. Osher, Z. Shi, and W. Zhu, Low Dimensional Manifold Model for Image Processing, Tech. Rep., CAM report 16-04, UCLA, Los Angeles, CA, 2016]. In particular, we show that LDMM is a weighted l2-regularization on the coefficients obtained by decomposing images into linear combinations of convolution framelets; based on this understanding, we extend the original LDMM to a reweighted version that yields further improved results. In addition, we establish the energy concentration property of convolution framelet coefficients for the setting where the local basis is constructed from a given nonlocal basis via a linear reconstruction framework; a generalization of this framework to unions of local embeddings can provide a natural setting for interpreting BM3D, one of the state-of-the-art image denoising algorithms. 

S. H. Chan, T. Zickler, and Y. M. Lu, “Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5107-5121, 2017. arXiv:1601.00088Abstract

Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such performance gain is related to the Sinkhorn-Knopp balancing algorithm, an iterative procedure that symmetrizes a row-stochastic matrix to a doubly-stochastic matrix. However, a complete understanding of the performance gain phenomenon is still lacking.

In this paper, we study the performance gain phenomenon from a statistical learning perspective. We show that Sinkhorn-Knopp is equivalent to an Expectation-Maximization (EM) algorithm of learning a Product of Gaussians (PoG) prior of the image patches. By establishing the correspondence between the steps of Sinkhorn-Knopp and the EM algorithm, we provide a geometrical interpretation of the symmetrization process. The new PoG model also allows us to develop a new denoising algorithm called Product of Gaussian Non-Local-Means (PoG-NLM). PoG-NLM is an extension of the Sinkhorn-Knopp and is a generalization of the classical non-local means. Despite its simple formulation, PoG-NLM outperforms many existing smoothing filters and has a similar performance compared to BM3D.

C. Wang and Y. M. Lu, “Online Learning for Sparse PCA in High Dimensions: Exact Dynamics and Phase Transitions,” in IEEE Information Theory Workshop (ITW), 2016.Abstract

We study the dynamics of an online algorithm for learning a sparse leading eigenvector from samples generated from a spiked covariance model. This algorithm combines the classical Oja's method for online PCA with an element-wise nonlinearity at each iteration to promote sparsity. In the high-dimensional limit, the joint empirical measure of the underlying sparse eigenvector and its estimate provided by the algorithm is shown to converge weakly to a deterministic, measure-valued process. This scaling limit is characterized as the unique solution of a nonlinear PDE, and it provides exact information regarding the asymptotic performance of the algorithm. For example, performance metrics such as the cosine similarity and the misclassification rate in sparse support recovery can be obtained by examining the limiting dynamics. A steady-state analysis of the nonlinear PDE also reveals an interesting phase transition phenomenon. Although our analysis is asymptotic in nature, numerical simulations show that the theoretical predictions are accurate for moderate signal dimensions.

J. Onativia, P. L. Dragotti, and Y. M. Lu, “ProSparse denoise: Prony's based sparsity pattern recovery in the presence of noise,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
Y. Chi and Y. M. Lu, “Kaczmarz Method for Solving Quadratic Equations,” IEEE Signal Processing Letters, vol. 23, no. 9, pp. 1183-1187, 2016.Abstract

Estimating low-rank positive-semidefinite (PSD) matrices from symmetric rank-one measurements is of great importance in many applications, such as high-dimensional data processing, quantum state tomography, and phase retrieval. When the rank is known a priori, this problem can be regarded as solving a system of quadratic equations of a low-dimensional subspace. The authors develop a fast iterative algorithm based on an adaptation of the Kaczmarz method, which is traditionally used for solving overdetermined linear systems. In particular, the authors characterize the dynamics of the algorithm when the measurement vectors are composed of standard Gaussian entries in the online setting. Numerical simulations demonstrate the compelling performance of the proposed algorithm.

F. Sroubek, J. Kamenicky, and Y. M. Lu, “Decomposition space-variant blur in image deconvolution,” IEEE Signal Processing Letters, vol. 23, no. 3, pp. 346-350, 2016.Abstract

Standard convolution as a model of radiometric degradation is in majority of cases inaccurate as the blur varies in space and we are thus required to work with a computationally demanding space-variant model. Space-variant degradation can be approximately decomposed to a set of standard convolutions. We explain in detail the properties of the space-variant degrada- tion operator and show two possible decomposition models and two approximation approaches. Our target application is space- variant image deconvolution, on which we illustrate theoretical differences between these models. We propose a computationally efficient restoration algorithm that belongs to a category of alternating direction methods of multipliers, which consists of four update steps with closed-form solutions. Depending on the used decomposition, two variations of the algorithm exist with distinct properties. We test the effectiveness of the decomposition models under different levels of approximation on synthetic and real examples, and conclude the letter by drawing several practical observations. 

D. M. Merfeld, T. K. Clark, Y. M. Lu, and F. Karmali, “Dynamics of Individual Perceptual Decisions,” Journal of Neurophysiology, vol. 115, no. 1, 2016. Publisher's VersionAbstract

Perceptual decision-making is fundamental to a broad range of fields including neurophysiology, economics, medicine, advertising, law, etc. While recent findings have yielded major advances in our understanding of perceptual decision-making, decision-making as a function of time and frequency (i.e., decision-making dynamics) is not well understood. To limit the review length, we focus most of this review on human findings. Animal findings, which are extensively reviewed elsewhere, are included when beneficial or necessary. We attempt to put these various findings and datasets - which can appear to be unrelated in the absence of a formal dynamic analysis - into context using published models. Specifically, by adding appropriate dynamic mechanisms (e.g., high-pass filters) to existing models, it appears that a number of otherwise seemingly disparate findings from the literature might be explained. One hypothesis that arises through this dynamic analysis is that decision-making includes phasic (high-pass) neural mechanisms, an evidence accumulator and/or some sort of mid-trial decision-making mechanism (e.g., peak detector and/or decision boundary).

C. Hu, J. Sepulcre, K. A. Johnson, G. E. Fakhri, Y. M. Lu, and Q. Li, “Matched Signal Detection on Graphs: Theory and Application to Brain Imaging Data Classification,” NeuroImage, vol. 125, pp. 587-600, 2016.Abstract

Motivated by recent progresses in signal processing on graphs, we have developed a matched signal detection (MSD) theory for signals with intrinsic structures described by weighted graphs. First, we regard graph Laplacian eigenvalues as frequencies of graph-signals and assume that the signal is in a subspace spanned by the first few graph Laplacian eigenvectors associated with lower eigenvalues. The conventional matched subspace detector can be applied to this case. Furthermore, we study signals that may not merely live in a subspace. Namely, we consider signals with bounded variation on graphs and more general signals that are randomly drawn from a prior distribution. For bounded variation signals, the test is a weighted energy detector. For the random signals, the test statistic is the difference of signal variations on associated graphs, if a degenerate Gaussian distribution specified by the graph Laplacian is adopted. We evaluate the effectiveness of the MSD on graphs both on simulated and real data sets. Specifically, we apply MSD to the brain imaging data classification problem of Alzheimer’s disease (AD) based on two independent data sets: 1) positron emission tomography data with Pittsburgh compound-B tracer of 30 AD and 40 normal control (NC) subjects, 2) resting-state functional magnetic resonance imaging (R-fMRI) data of 30 early mild cognitive impairment and 20 NC subjects. Our results demonstrate that the MSD approach is able to outperform the traditional methods and help detect AD at an early stage, probably due to the success of exploiting the manifold structure of the data. 

A. Minot, Y. M. Lu, and N. Li, “A Distributed Gauss-Newton Method for Power System State Estimation,” IEEE Transactions on Power Systems, vol. 31, no. 5, pp. 3804-3815, 2016.Abstract

We propose a fully distributed Gauss-Newton algorithm for state estimation of electric power systems. At each Gauss-Newton iteration, matrix-splitting techniques are utilized to carry out the matrix inversion needed for calculating the Gauss-Newton step in a distributed fashion. In order to reduce the communication burden as well as increase robustness of state estimation, the proposed distributed scheme relies only on local information and a limited amount of information from neighboring areas. The matrix-splitting scheme is designed to calculate the Gauss-Newton step with exponential convergence speed. The effectiveness of the method is demonstrated in various numerical experiments.

I. Dokmanic and Y. M. Lu, “Sampling Sparse Signals on the Sphere: Algorithms and Applications,” IEEE Transactions on Signal Processing, vol. 64, no. 1, pp. 189-202, 2016. arXiv:1502.07577Abstract

We propose a sampling scheme that can perfectly reconstruct a collection of
spikes on the sphere from samples of their lowpass-filtered observations.
Central to our algorithm is a generalization of the annihilating filter
method, a tool widely used in array signal processing and finite-rate-of-innovation
(FRI) sampling. The proposed algorithm can reconstruct $K$ spikes
from $(K+\sqrt{K})^2$ spatial samples---a sampling requirement that
improves over known sparse sampling schemes on the sphere by a factor of up
to four.

We showcase the versatility of the proposed algorithm by applying it to
three different problems: 1) sampling diffusion processes induced by
localized sources on the sphere, 2) shot-noise removal, and 3) sound source
localization (SSL) by a spherical microphone array. In particular, we show
how SSL can be reformulated as a spherical sparse sampling problem.

G. Li, Y. Gu, and Y. M. Lu, “Phase Retrieval Using Iterative Projections: Dynamics in the Large Systems Limit,” in Allerton Conference on Communications, Control, and Computing, 2015.Abstract

We study phase retrieval by using a simple non-convex algorithm based on iterative projections. Our main contribution in this work is to establish an exact analysis of the dynamics of the algorithm in an online setting, with Gaussian measurement vectors. We show that the algorithm dynamics, measured by the squared distance between the current estimate and the true solution, can be fully characterized by a 2D Markov process, irrespective of the underlying signal dimension. Furthermore, in the large systems limit (i.e., as the signal dimension tends to infinity), the random sample paths of the 2D Markov process converge to the solutions of two deterministic and coupled ordinary differential equations (ODEs). Numerical simulations verify the accuracy of our analytical predictions, even for moderate system sizes. This suggests that the ODE approach presented in this work provides an effective tool for analyzing the performance and convergence of the algorithm.