Publications by Type: Journal Article

Y. M. Lu, “Householder Dice: A Matrix-Free Algorithm for Simulating Dynamics on Gaussian and Random Orthogonal Ensembles,” Submitted. arXiv:2101.07464 [cs.IT]Abstract
This paper proposes a new algorithm, named Householder Dice (HD), for simulating dynamics on dense random matrix ensembles with translation-invariant properties. Examples include the Gaussian ensemble, the Haar-distributed random orthogonal ensemble, and their complex-valued counterparts. A "direct" approach to the simulation, where one first generates a dense n×n matrix from the ensemble, requires at least O(n2) resource in space and time. The HD algorithm overcomes this O(n2) bottleneck by using the principle of deferred decisions: rather than fixing the entire random matrix in advance, it lets the randomness unfold with the dynamics. At the heart of this matrix-free algorithm is an adaptive and recursive construction of (random) Householder reflectors. These orthogonal transformations exploit the group symmetry of the matrix ensembles, while simultaneously maintaining the statistical correlations induced by the dynamics. The memory and computation costs of the HD algorithm are O(nT) and O(nT2), respectively, with T being the number of iterations. When T≪n, which is nearly always the case in practice, the new algorithm leads to significant reductions in runtime and memory footprint. Numerical results demonstrate the promise of the HD algorithm as a new computational tool in the study of high-dimensional random systems.
H. Hu and Y. M. Lu, “Universality Laws for High-Dimensional Learning with Random Features,” Submitted. arXiv:2009.07669 [cs.IT]Abstract
We prove a universality theorem for learning with random features. Our result shows that, in terms of training and generalization errors, the random feature model with a nonlinear activation function is asymptotically equivalent to a surrogate Gaussian model with a matching covariance matrix. This settles a conjecture based on which several recent papers develop their results. Our method for proving the universality builds on the classical Lindeberg approach. Major ingredients of the proof include a leave-one-out analysis for the optimization problem associated with the training process and a central limit theorem, obtained via Stein's method, for weakly correlated random variables.
O. Dhifallah and Y. M. Lu, “A Precise Performance Analysis of Learning with Random Features,” Submitted. arXiv:2008.11904 [cs.IT]Abstract
We study the problem of learning an unknown function using random feature models. Our main contribution is an exact asymptotic analysis of such learning problems with Gaussian data. Under mild regularity conditions for the feature matrix, we provide an exact characterization of the asymptotic training and generalization errors, valid in both the under-parameterized and over-parameterized regimes. The analysis presented in this paper holds for general families of feature matrices, activation functions, and convex loss functions. Numerical results validate our theoretical predictions, showing that our asymptotic findings are in excellent agreement with the actual performance of the considered learning problem, even in moderate dimensions. Moreover, they reveal an important role played by the regularization, the loss function and the activation function in the mitigation of the "double descent phenomenon" in learning.
H. Hu and Y. M. Lu, “Asymptotics and Optimal Designs of SLOPE for Sparse Linear Regression,” Submitted. arXiv:1903.11582 [cs.IT]Abstract
In sparse linear regression, the SLOPE estimator generalizes LASSO by assigning magnitude-dependent regularizations to different coordinates of the estimate. In this paper, we present an asymptotically exact characterization of the performance of SLOPE in the high-dimensional regime where the number of unknown parameters grows in proportion to the number of observations. Our asymptotic characterization enables us to derive optimal regularization sequences to either minimize the MSE or to maximize the power in variable selection under any given level of Type-I error. In both cases, we show that the optimal design can be recast as certain infinite-dimensional convex optimization problems, which have efficient and accurate finite-dimensional approximations. Numerical simulations verify our asymptotic predictions. They also demonstrate the superiority of our optimal design over LASSO and a regularization sequence previously proposed in the literature.
O. Dhifallah, C. Thrampoulidis, and Y. M. Lu, “Phase Retrieval via Polytope Optimization: Geometry, Phase Transitions, and New Algorithms,” Submitted. arXiv:1805.09555 [cs.IT]Abstract
We study algorithms for solving quadratic systems of equations based on optimization methods over polytopes. Our work is inspired by a recently proposed convex formulation of the phase retrieval problem, which estimates the unknown signal by solving a simple linear program over a polytope constructed from the measurements. We present a sharp characterization of the high-dimensional geometry of the aforementioned polytope under Gaussian measurements. This characterization allows us to derive asymptotically exact performance guarantees for PhaseMax, which also reveal a phase transition phenomenon with respect to its sample complexity. Moreover, the geometric insights gained from our analysis lead to a new nonconvex formulation of the phase retrieval problem and an accompanying iterative algorithm, which we call PhaseLamp. We show that this new algorithm has superior recovery performance over the original PhaseMax method. Finally, as yet another variation on the theme of performing phase retrieval via polytope optimization, we propose a weighted version of PhaseLamp and demonstrate, through numerical simulations, that it outperforms several state-of-the-art algorithms under both generic Gaussian measurements as well as more realistic Fourier-type measurements that arise in phase retrieval applications.
C. Wang, J. Mattingly, and Y. M. Lu, “Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA,” Submitted. arXiv:1712.04332 [cs.LG]Abstract
We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target feature vector and its estimates provided by the algorithms will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE can be efficiently obtained. These solutions lead to precise predictions of the performance of the algorithms, as many practical performance metrics are linear functionals of the joint empirical measures. In addition to characterizing the dynamic performance of online learning algorithms, our asymptotic analysis also provides useful insights. In particular, in the high-dimensional limit, and due to exchangeability, the original coupled dynamics associated with the algorithms will be asymptotically ``decoupled'', with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight for nonconvex optimization problems may prove an interesting line of future research.
A. Agaskar and Y. M. Lu, “Optimal Detection of Random Walks on Graphs: Performance Analysis via Statistical Physics,” Submitted. arXiv:1504.06924Abstract

We study the problem of detecting a random walk on a graph from a sequence of noisy measurements at every node. There are two hypotheses: either every observation is just meaningless zero-mean Gaussian noise, or at each time step exactly one node has an elevated mean, with its location following a random walk on the graph over time. We want to exploit knowledge of the graph structure and random walk parameters (specified by a Markov chain transition matrix) to detect a possibly very weak signal. The optimal detector is easily derived, and we focus on the harder problem of characterizing its performance through the (type-II) error exponent: the decay rate of the miss probability under a false alarm constraint.
The expression for the error exponent resembles the free energy of a spin glass in statistical physics, and we borrow techniques from that field to develop a lower bound. Our fully rigorous analysis uses large deviations theory to show that the lower bound exhibits a phase transition: strong performance is only guaranteed when the signal-to-noise ratio exceeds twice the entropy rate of the random walk.
Monte Carlo simulations show that the lower bound fully captures the behavior of the true exponent.

O. Dhifallah and Y. M. Lu, “Phase Transitions in Transfer Learning for High-Dimensional Perceptrons,” Entropy, Special Issue "The Role of Signal Processing and Information Theory in Modern Machine Learning", vol. 23, no. 4, 2021. arXiv:2101.01918 [cs.LG]Abstract
Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to the so-called negative transfer phenomenon, where the transferred source information actually reduces the generalization performance of the target task. This happens when the two tasks are sufficiently dissimilar. In this paper, we present a theoretical analysis of transfer learning by studying a pair of related perceptron learning tasks. Despite the simplicity of our model, it reproduces several key phenomena observed in practice. Specifically, our asymptotic analysis reveals a phase transition from negative transfer to positive transfer as the similarity of the two tasks moves past a well-defined threshold.
H. Hu and Y. M. Lu, “The Limiting Poisson Law of Massive MIMO Detection with Box Relaxation,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 695-704, 2020. arXiv:2006.08416 [cs.IT]Abstract
Estimating a binary vector from noisy linear measurements is a prototypical problem for MIMO systems. A popular algorithm, called the box-relaxation decoder, estimates the target signal by solving a least squares problem with convex constraints. This paper shows that the performance of the algorithm, measured by the number of incorrectly-decoded bits, has a limiting Poisson law. This occurs when the sampling ratio and noise variance, two key parameters of the problem, follow certain scalings as the system dimension grows. Moreover, at a well-defined threshold, the probability of perfect recovery is shown to undergo a phase transition that can be characterized by the Gumbel distribution. Numerical simulations corroborate these theoretical predictions, showing that they match the actual performance of the algorithm even in moderate system dimensions.
Y. M. Lu and G. Li, “Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation,” Information and Inference: A Journal of the IMA, vol. 9, no. 3, pp. 507-541, 2020. arXiv:1702.06435 [cs.IT]Abstract

We study a spectral initialization method that serves as a key ingredient in recent work on using efficient iterative algorithms for estimating signals in nonconvex settings. Unlike previous analysis in the literature, which is restricted to the phase retrieval setting and which provides only performance bounds, we consider arbitrary generalized linear sensing models and present a precise asymptotic characterization of the performance of the spectral method in the high-dimensional regime. Our analysis reveals a phase transition phenomenon that depends on the sampling ratio. When the ratio is below a minimum threshold, the estimates given by the spectral method are no better than a random guess drawn uniformly from the hypersphere; above a maximum threshold, however, the estimates become increasingly aligned with the target signal. The computational complexity of the spectral method is also markedly different in the two phases. Worked examples and numerical results are provided to illustrate and verify the analytical predictions. In particular, simulations show that our asymptotic formulas provide accurate predictions even at moderate signal dimensions.

W. Luo, W. Alghamdi, and Y. M. Lu, “Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval,” IEEE Transactions on Signal Processing, vol. 67, no. 9, pp. 2347-2356, 2019. arXiv:1811.04420 [cs.IT]Abstract

We present the optimal design of a spectral method widely used to initialize nonconvex optimization algorithms for solving phase retrieval and other signal recovery problems. Our work leverages recent results that provide an exact characterization of the performance of the spectral method in the high-dimensional limit. This characterization allows us to map the task of optimal design to a constrained optimization problem in a weighted $L^2$ function space. The latter has a closed-form solution. Interestingly, under a mild technical condition, our results show that there exists a fixed design that is uniformly optimal over all sampling ratios. Numerical simulations demonstrate the performance improvement brought by the proposed optimal design over existing constructions in the literature. In a recent work, Mondelli and Montanari have shown the existence of a weak reconstruction threshold below which the spectral method cannot provide useful estimates. Our results serve to complement that work by deriving the fundamental limit of the spectral method beyond the aforementioned threshold.

Y. Chi, Y. M. Lu, and Y. Chen, “Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview,” IEEE Transactions on Signal Processing, vol. 67, no. 20, pp. 5239-5269, 2019. arXiv:1809.09573 [cs.LG]Abstract

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently.

In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated thinking of optimization and statistics leads to fruitful research findings.

C. Wang and Y. M. Lu, “The scaling limit of high-dimensional online independent component analysis,” Journal of Statistical Mechanics (Special Issue on Machine Learning), vol. 2019, 2019. Publisher's VersionAbstract
We analyze the dynamics of an online algorithm for independent component analysis in the high-dimensional scaling limit. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measure of the target feature vector and the estimates provided by the algorithm will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE, which involves two spatial variables and one time variable, can be efficiently obtained. These solutions provide detailed information about the performance of the ICA algorithm, as many practical performance metrics are functionals of the joint empirical measures. Numerical simulations show that our asymptotic analysis is accurate even for moderate dimensions. In addition to providing a tool for understanding the performance of the algorithm, our PDE analysis also provides useful insight. In particular, in the high-dimensional limit, the original coupled dynamics associated with the algorithm will be asymptotically 'decoupled', with each coordinate independently solving a 1D effective minimization problem via stochastic gradient descent. Exploiting this insight to design new algorithms for achieving optimal trade-offs between computational and statistical efficiency may prove an interesting line of future research.
G. Baechler, M. Kreković, J. Ranieri, A. Chebira, Y. M. Lu, and M. Vetterli, “Super resolution phase retrieval for sparse signals,” IEEE Transactions on Signal Processing, vol. 67, no. 18, 2019. arXiv:1808.01961 [cs.IT]Abstract
In a variety of fields, in particular those involving imaging and optics, we often measure signals whose phase is missing or has been irremediably distorted. Phase retrieval attempts to recover the phase information of a signal from the magnitude of its Fourier transform to enable the reconstruction of the original signal. Solving the phase retrieval problem is equivalent to recovering a signal from its auto-correlation function. In this paper, we assume the original signal to be sparse; this is a natural assumption in many applications, such as X-ray crystallography, speckle imaging and blind channel estimation. We propose an algorithm that resolves the phase retrieval problem in three stages: i) we leverage the finite rate of innovation sampling theory to super-resolve the auto-correlation function from a limited number of samples, ii) we design a greedy algorithm that identifies the locations of a sparse solution given the super-resolved auto-correlation function, iii) we recover the amplitudes of the atoms given their locations and the measured auto-correlation function. Unlike traditional approaches that recover a discrete approximation of the underlying signal, our algorithm estimates the signal on a continuous domain, which makes it the first of its kind. Along with the algorithm, we derive its performance bound with a theoretical analysis and propose a set of enhancements to improve its computational complexity and noise resilience. Finally, we demonstrate the benefits of the proposed method via a comparison against Charge Flipping, a notable algorithm in crystallography.
D. Simon, J. Sulam, Y. Romano, Y. M. Lu, and M. Elad, “MMSE Approximation For Sparse Coding Algorithms Using Stochastic Resonance,” IEEE Transactions on Signal Processing, vol. 67, no. 17, 2019. arXiv:1806.10171 [eess.SP]Abstract

Sparse coding refers to the pursuit of the sparsest representation of a signal in a typically overcomplete dictionary. From a Bayesian perspective, sparse coding provides a Maximum a Posteriori (MAP) estimate of the unknown vector under a sparse prior. Various nonlinear algorithms are available to approximate the solution of such problems.

In this work, we suggest enhancing the performance of sparse coding algorithms by a deliberate and controlled contamination of the input with random noise, a phenomenon known as stochastic resonance. This not only allows for increased performance, but also provides a computationally efficient approximation to the Minimum Mean Square Error (MMSE) estimator, which is ordinarily intractable to compute. We demonstrate our findings empirically and provide a theoretical analysis of our method under several different cases.

C. Wang, Y. C. Eldar, and Y. M. Lu, “Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 6, 2018. arXiv:1805.06834 [cs.LG]Abstract
We present a high-dimensional analysis of three popular algorithms, namely, Oja's method, GROUSE and PETRELS, for subspace estimation from streaming and highly incomplete observations.  We show that, with proper time scaling, the time-varying principal angles between the true subspace and its estimates given by the algorithms converge weakly to deterministic processes when the ambient dimension \(n\) tends to infinity. Moreover, the limiting processes can be exactly characterized as the unique solutions of certain ordinary differential equations (ODEs). A finite sample bound is also given, showing that the rate of convergence towards such limits is \(\mathcal{O}(1/\sqrt{n})\). In addition to providing asymptotically exact predictions of the dynamic performance of the algorithms, our high-dimensional analysis yields several insights, including an asymptotic equivalence between Oja's method and GROUSE, and a precise scaling relationship linking the amount of missing data to the signal-to-noise ratio. By analyzing the solutions of the limiting ODEs, we also establish phase transition phenomena associated with the steady-state performance of these techniques.
L. Balzano, Y. Chi, and Y. M. Lu, “A Modern Perspective on Streaming PCA and Subspace Tracking: The Missing Data Case,” Proceedings of the IEEE, vol. 106, no. 8, pp. 1293-1310, 2018.Abstract
For many modern applications in science and engineering, data are collected in a streaming fashion carrying time-varying information, and practitioners need to process them with a limited amount of memory and computational resources in a timely manner for decision making. This often is coupled with the missing data problem, such that only a small fraction of data attributes are observed. These complications impose significant, and unconventional, constraints on the problem of streaming Principal Component Analysis (PCA) and subspace tracking, which is an essential building block for many inference  tasks in signal processing and machine learning. This survey article reviews a variety of classical and recent algorithms for solving this problem with low computational and memory complexities, particularly those applicable in the big data regime with missing data. We illustrate that streaming PCA and subspace tracking algorithms can be understood through algebraic and geometric perspectives and they need to be adjusted carefully to handle missing data. Both asymptotic and non-asymptotic convergence guarantees are reviewed. Finally, we benchmark the performance of several competitive algorithms in the presence of missing data for both well-conditioned and ill-conditioned systems.
Y. M. Lu, J. Oñativia, and P. L. Dragotti, “Sparse Representation in Fourier and Local Bases Using ProSparse: A Probabilistic Analysis,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2639-2647, 2018. arXiv:1611.07971 [cs.IT]Abstract

Finding the sparse representation of a signal in an overcomplete dictionary has attracted a lot of attention over the past years. This paper studies ProSparse, a new polynomial complexity algorithm that solves the sparse representation problem when the underlying dictionary is the union of a Vandermonde matrix and a banded matrix. Unlike our previous work which establishes deterministic (worst-case) sparsity bounds for ProSparse to succeed, this paper presents a probabilistic average-case analysis of the algorithm. Based on a generating-function approach, closed-form expressions for the exact success probabilities of ProSparse are given. The success probabilities are also analyzed in the high-dimensional regime. This asymptotic analysis characterizes a sharp phase transition phenomenon regarding the performance of the algorithm.

R. Yin, T. Gao, Y. M. Lu, and I. Daubechies, “A Tale of Two Bases: Local-Nonlocal Regularization on Image Patches with Convolution Framelets,” SIAM Journal on Imaging Sciences, vol. 10, no. 2, pp. 711-750, 2017.Abstract

We propose an image representation scheme combining the local and nonlocal characterization of patches in an image. Our representation scheme can be shown to be equivalent to a tight frame constructed from convolving local bases (e.g., wavelet frames, discrete cosine transforms, etc.) with nonlocal bases (e.g., spectral basis induced by nonlinear dimension reduction on patches), and we call the resulting frame elements convolution framelets. Insight gained from analyzing the proposed representation leads to a novel interpretation of a recent high-performance patch-based image pro- cessing algorithm using the point integral method (PIM) and the low dimensional manifold model (LDMM) [S. Osher, Z. Shi, and W. Zhu, Low Dimensional Manifold Model for Image Processing, Tech. Rep., CAM report 16-04, UCLA, Los Angeles, CA, 2016]. In particular, we show that LDMM is a weighted l2-regularization on the coefficients obtained by decomposing images into linear combinations of convolution framelets; based on this understanding, we extend the original LDMM to a reweighted version that yields further improved results. In addition, we establish the energy concentration property of convolution framelet coefficients for the setting where the local basis is constructed from a given nonlocal basis via a linear reconstruction framework; a generalization of this framework to unions of local embeddings can provide a natural setting for interpreting BM3D, one of the state-of-the-art image denoising algorithms. 

S. H. Chan, T. Zickler, and Y. M. Lu, “Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5107-5121, 2017. arXiv:1601.00088Abstract

Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such performance gain is related to the Sinkhorn-Knopp balancing algorithm, an iterative procedure that symmetrizes a row-stochastic matrix to a doubly-stochastic matrix. However, a complete understanding of the performance gain phenomenon is still lacking.

In this paper, we study the performance gain phenomenon from a statistical learning perspective. We show that Sinkhorn-Knopp is equivalent to an Expectation-Maximization (EM) algorithm of learning a Product of Gaussians (PoG) prior of the image patches. By establishing the correspondence between the steps of Sinkhorn-Knopp and the EM algorithm, we provide a geometrical interpretation of the symmetrization process. The new PoG model also allows us to develop a new denoising algorithm called Product of Gaussian Non-Local-Means (PoG-NLM). PoG-NLM is an extension of the Sinkhorn-Knopp and is a generalization of the classical non-local means. Despite its simple formulation, PoG-NLM outperforms many existing smoothing filters and has a similar performance compared to BM3D.