Publications by Year: Submitted

R. Dudeja, Y. M. Lu, and S. Sen, “Universality of Approximate Message Passing with Semi-Random Matrices,” Submitted. arXiv:2204.04281 [math.PR]Abstract
Approximate Message Passing (AMP) is a class of iterative algorithms that have found applications in many problems in high-dimensional statistics and machine learning. In its general form, AMP can be formulated as an iterative procedure driven by a matrix M. Theoretical analyses of AMP typically assume strong distributional properties on M such as M has i.i.d. sub-Gaussian entries or is drawn from a rotational invariant ensemble. However, numerical experiments suggest that the behavior of AMP is universal, as long as the eigenvectors of M are generic. In this paper, we take the first step in rigorously understanding this universality phenomenon. In particular, we investigate a class of memory-free AMP algorithms (proposed by Çakmak and Opper for mean-field Ising spin glasses), and show that their asymptotic dynamics is universal on a broad class of semi-random matrices. In addition to having the standard rotational invariant ensemble as a special case, the class of semi-random matrices that we define in this work also includes matrices constructed with very limited randomness. One such example is a randomly signed version of the Sine model, introduced by Marinari, Parisi, Potters, and Ritort for spin glasses with fully deterministic couplings.
H. Hu and Y. M. Lu, “Sharp Asymptotics of Kernel Ridge Regression Beyond the Linear Regime,” Submitted. arXiv:2205.06798 [cs.LG]Abstract
The generalization performance of kernel ridge regression (KRR) exhibits a multi-phased pattern that crucially depends on the scaling relationship between the sample size n and the underlying dimension d. This phenomenon is due to the fact that KRR sequentially learns functions of increasing complexity as the sample size increases; when dk−1≪n≪dk, only polynomials with degree less than k are learned. In this paper, we present sharp asymptotic characterization of the performance of KRR at the critical transition regions with n≍dk, for k∈ℤ+. Our asymptotic characterization provides a precise picture of the whole learning process and clarifies the impact of various parameters (including the choice of the kernel function) on the generalization performance. In particular, we show that the learning curves of KRR can have a delicate "double descent" behavior due to specific bias-variance trade-offs at different polynomial scaling regimes.
Y. M. Lu and H. T. Yau, “An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices,” Submitted. arXiv:2205.06308 [math.PR]Abstract
We consider random matrices whose entries are obtained by applying a (nonlinear) kernel function to the pairwise inner products between \(n\) independent data vectors drawn uniformly from the unit sphere in \(\mathbb{R}^d\). Our study of this model is motivated by problems in machine learning, statistics, and signal processing, where such inner-product kernel random matrices and their spectral properties play important roles. Under mild conditions on the kernel function, we establish the weak-limit of the empirical spectral distribution of these matrices when \(d, n \to \infty\) such that \(n/d^\ell \to \kappa \in (0, \infty)\), for some fixed \(\ell \in \mathbb{N}\) and \(\kappa \in \mathbb{R}\). This generalizes an earlier result of Cheng and Singer (2013), who studied the same model in the linear scaling regime (with \(\ell = 1\) and \(n/d \to \kappa\)). The main insight of our work is a general equivalence principle: the spectrum of the random kernel matrix is asymptotically equivalent to that of a simpler matrix model, constructed as the linear combination of a (shifted) Wishart matrix and an independent matrix drawn from the Gaussian orthogonal ensemble. The aspect ratio of the Wishart matrix and the coefficients of the linear combination are determined by \(\ell\) and by the expansion of the kernel function in the orthogonal Hermite polynomial basis. Consequently, the limiting spectrum of the random kernel matrix can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.