We prove a universality theorem for learning with random features. Our result shows that, in terms of training and generalization errors, the random feature model with a nonlinear activation function is asymptotically equivalent to a surrogate Gaussian model with a matching covariance matrix. This settles a conjecture based on which several recent papers develop their results. Our method for proving the universality builds on the classical Lindeberg approach. Major ingredients of the proof include a leave-one-out analysis for the optimization problem associated with the training process and a central limit theorem, obtained via Stein's method, for weakly correlated random variables.
We study the problem of learning an unknown function using random feature models. Our main contribution is an exact asymptotic analysis of such learning problems with Gaussian data. Under mild regularity conditions for the feature matrix, we provide an exact characterization of the asymptotic training and generalization errors, valid in both the under-parameterized and over-parameterized regimes. The analysis presented in this paper holds for general families of feature matrices, activation functions, and convex loss functions. Numerical results validate our theoretical predictions, showing that our asymptotic findings are in excellent agreement with the actual performance of the considered learning problem, even in moderate dimensions. Moreover, they reveal an important role played by the regularization, the loss function and the activation function in the mitigation of the "double descent phenomenon" in learning.
In sparse linear regression, the SLOPE estimator generalizes LASSO by assigning magnitude-dependent regularizations to different coordinates of the estimate. In this paper, we present an asymptotically exact characterization of the performance of SLOPE in the high-dimensional regime where the number of unknown parameters grows in proportion to the number of observations. Our asymptotic characterization enables us to derive optimal regularization sequences to either minimize the MSE or to maximize the power in variable selection under any given level of Type-I error. In both cases, we show that the optimal design can be recast as certain infinite-dimensional convex optimization problems, which have efficient and accurate finite-dimensional approximations. Numerical simulations verify our asymptotic predictions. They also demonstrate the superiority of our optimal design over LASSO and a regularization sequence previously proposed in the literature.
We study algorithms for solving quadratic systems of equations based on optimization methods over polytopes. Our work is inspired by a recently proposed convex formulation of the phase retrieval problem, which estimates the unknown signal by solving a simple linear program over a polytope constructed from the measurements. We present a sharp characterization of the high-dimensional geometry of the aforementioned polytope under Gaussian measurements. This characterization allows us to derive asymptotically exact performance guarantees for PhaseMax, which also reveal a phase transition phenomenon with respect to its sample complexity. Moreover, the geometric insights gained from our analysis lead to a new nonconvex formulation of the phase retrieval problem and an accompanying iterative algorithm, which we call PhaseLamp. We show that this new algorithm has superior recovery performance over the original PhaseMax method. Finally, as yet another variation on the theme of performing phase retrieval via polytope optimization, we propose a weighted version of PhaseLamp and demonstrate, through numerical simulations, that it outperforms several state-of-the-art algorithms under both generic Gaussian measurements as well as more realistic Fourier-type measurements that arise in phase retrieval applications.
We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target feature vector and its estimates provided by the algorithms will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE can be efficiently obtained. These solutions lead to precise predictions of the performance of the algorithms, as many practical performance metrics are linear functionals of the joint empirical measures. In addition to characterizing the dynamic performance of online learning algorithms, our asymptotic analysis also provides useful insights. In particular, in the high-dimensional limit, and due to exchangeability, the original coupled dynamics associated with the algorithms will be asymptotically ``decoupled'', with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight for nonconvex optimization problems may prove an interesting line of future research.
We study the problem of detecting a random walk on a graph from a sequence of noisy measurements at every node. There are two hypotheses: either every observation is just meaningless zero-mean Gaussian noise, or at each time step exactly one node has an elevated mean, with its location following a random walk on the graph over time. We want to exploit knowledge of the graph structure and random walk parameters (specified by a Markov chain transition matrix) to detect a possibly very weak signal. The optimal detector is easily derived, and we focus on the harder problem of characterizing its performance through the (type-II) error exponent: the decay rate of the miss probability under a false alarm constraint. The expression for the error exponent resembles the free energy of a spin glass in statistical physics, and we borrow techniques from that field to develop a lower bound. Our fully rigorous analysis uses large deviations theory to show that the lower bound exhibits a phase transition: strong performance is only guaranteed when the signal-to-noise ratio exceeds twice the entropy rate of the random walk. Monte Carlo simulations show that the lower bound fully captures the behavior of the true exponent.