We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where α=n/d is kept finite in the limit of a high dimension d and number of samples n. Our contribution is three-fold: First, we prove a formula for the generalization error achieved by ℓ2 regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the ℓ2 regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely. As α→∞ they lead to Bayes-optimal rates, a fact that does not follow from predictions of margin-based generalization error bounds. Third, we design an optimal loss and regularizer that provably leads to Bayes-optimal generalization error.
We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number n of samples and their dimension d go to infinity while their ratio is fixed to α=n/d. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.
Despite the remarkable successes of generative adversarial networks (GANs) in many applications, theoretical understandings of their performance is still limited. In this paper, we present a simple shallow GAN model fed by high-dimensional input data. The dynamics of the training process of the proposed model can be exactly analyzed in the high-dimensional limit. In particular, by using the tool of scaling limits of stochastic processes, we show that the macroscopic quantities measuring the quality of the training process converge to a deterministic process that is characterized as the unique solution of a finite-dimensional ordinary differential equation (ODE). The proposed model is simple, but its training process already exhibits several different phases that can mimic the behaviors of more realistic GAN models used in practice. Specifically, depending on the choice of the learning rates, the training process can reach either a successful, a failed, or an oscillating phase. By studying the steady-state solutions of the limiting ODEs, we obtain a phase diagram that precisely characterizes the conditions under which each phase takes place. Although this work focuses on a simple GAN model, the analysis methods developed here might prove useful in the theoretical understanding of other variants of GANs with more advanced training algorithms.
In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal that is observed through a linear transform followed by a component-wise, possibly nonlinear and noisy, channel. In the Bayesian optimal setting, Generalized Approximate Message Passing (GAMP) is known to achieve optimal performance for GLE. However, its performance can significantly degrade whenever there is a mismatch between the assumed and the true generative model, a situation frequently encountered in practice. In this paper, we propose a new algorithm, named Generalized Approximate Survey Propagation (GASP), for solving GLE in the presence of prior or model mis-specifications. As a prototypical example, we consider the phase retrieval problem, where we show that GASP outperforms the corresponding GAMP, reducing the reconstruction threshold and, for certain choices of its parameters, approaching Bayesian optimal performance. Furthermore, we present a set of State Evolution equations that exactly characterize the dynamics of GASP in the high-dimensional limit.
In sparse linear regression, the SLOPE estimator generalizes LASSO by assigning magnitude-dependent regular- izations to different coordinates of the estimate. In this paper, we present an asymptotically exact characterization of the performance of SLOPE in the high-dimensional regime where the number of unknown parameters grows in proportion to the number of observations. Our asymptotic characterization enables us to derive optimal regularization sequences to either minimize the MSE or to maximize the power in variable selection under any given level of Type-I error. In both cases, we show that the optimal design can be recast as certain infinite-dimensional convex optimization problems, which have efficient and accurate finite-dimensional approximations. Numerical simulations verify our asymptotic predictions. They also demonstrate the superi- ority of our optimal design over LASSO and a regularization sequence previously proposed in the literature.
A recently proposed convex formulation of the phase retrieval problem estimates the unknown signal by solving a simple linear program. This new scheme, known as PhaseMax, is computationally efficient compared to standard convex relaxation methods based on lifting techniques. In this paper, we present an exact performance analysis of PhaseMax under Gaussian measurements in the large system limit. In contrast to previously known performance bounds in the literature, our results are asymptotically exact and they also reveal a sharp phase transition phenomenon. Furthermore, the geometrical insights gained from our analysis led us to a novel nonconvex formulation of the phase retrieval problem and an accompanying iterative algorithm based on successive linearization and maximization over a polytope. This new algorithm, which we call PhaseLamp, has provably superior recovery performance over the original PhaseMax method.
We analyze the dynamics of an online algorithm for independent component analysis in the high-dimensional scaling limit. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measure of the target feature vector and the estimates provided by the algorithm will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE, which involves two spatial variables and one time variable, can be efficiently obtained. These solutions provide detailed information about the performance of the ICA algorithm, as many practical performance metrics are functionals of the joint empirical measures. Numerical simulations show that our asymptotic analysis is accurate even for moderate dimensions. In addition to providing a tool for understanding the performance of the algorithm, our PDE analysis also provides useful insight. In particular, in the high-dimensional limit, the original coupled dynamics associated with the algorithm will be asymptotically “decoupled”, with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight to design new algorithms for achieving optimal trade-offs between computational and statistical efficiency may prove an interesting line of future research.
We consider a recently proposed convex formulation, known as the PhaseMax method, for solving the phase retrieval problem. Using the replica method from statistical mechanics, we analyze the performance of PhaseMax in the high-dimensional limit. Our analysis predicts the exact asymptotic performance of PhaseMax. In particular, we show that a sharp phase transi- tion phenomenon takes place, with a simple analytical formula characterizing the phase transition boundary. This result shows that the oversampling ratio required by existing performance bounds in the literature can be significantly reduced. Numerical results confirm the validity of our replica analysis, showing that the theoretical predictions are in excellent agreement with the actual performance of the algorithm, even for moderate signal dimensions.
This paper won the Best Student Paper Award (First Prize) at the 2017 IEEE CAMSAP Workshop.
The predictions made in this paper via the non-rigorous replica method has since been rigorously established in our latest work.
We study a simple spectral method that serves as a key ingredient in a growing line of work using efficient iterative algorithms for estimating signals in nonconvex settings. Unlike previous work, which focuses on the phase retrieval setting and provides only bounds on the performance, we consider arbitrary generalized linear sensing models and provide an exact characterization of the performance of the spectral method in the high-dimensional regime. Our analysis reveals a phase transition phenomenon that depends on the sampling ratio. When the ratio is below a critical threshold, the estimates given by the spectral method are no better than random guesses drawn uniformly from the hypersphere; above the threshold, however, the estimates become increasingly aligned with the underlying signal. Worked examples and numerical simulations are provided to illustrate and verify the analytical predictions.
The problem of estimating and tracking low-rank subspaces from incomplete observations has received a lot of attention recently in the signal processing and learning communities. Popular algorithms, such as GROUSE and PETRELS, are often very effective in practice, but their performance depends on the careful choice of algorithmic parameters. Important questions, such as the global convergence of these algorithms and how the noise level, subsampling ratio, and various other parameters affect the performance, are not fully understood. In this paper, we present a precise analysis of the performance of these algorithms in the asymptotic regime where the ambient dimension tends to infinity. Specifically, we show that the time-varying trajectories of estimation errors converge weakly to a deterministic function of time, which is characterized as the unique solution of a system of ordinary differential equations (ODEs.) Analyzing the limiting ODEs also reveals and characterizes sharp phase transition phenomena associated with these algorithms. Numerical simulations verify the accuracy of our asymptotic predictions, even for moderate signal dimensions.
We study the dynamics of an online algorithm for learning a sparse leading eigenvector from samples generated from a spiked covariance model. This algorithm combines the classical Oja's method for online PCA with an element-wise nonlinearity at each iteration to promote sparsity. In the high-dimensional limit, the joint empirical measure of the underlying sparse eigenvector and its estimate provided by the algorithm is shown to converge weakly to a deterministic, measure-valued process. This scaling limit is characterized as the unique solution of a nonlinear PDE, and it provides exact information regarding the asymptotic performance of the algorithm. For example, performance metrics such as the cosine similarity and the misclassification rate in sparse support recovery can be obtained by examining the limiting dynamics. A steady-state analysis of the nonlinear PDE also reveals an interesting phase transition phenomenon. Although our analysis is asymptotic in nature, numerical simulations show that the theoretical predictions are accurate for moderate signal dimensions.
We study phase retrieval by using a simple non-convex algorithm based on iterative projections. Our main contribution in this work is to establish an exact analysis of the dynamics of the algorithm in an online setting, with Gaussian measurement vectors. We show that the algorithm dynamics, measured by the squared distance between the current estimate and the true solution, can be fully characterized by a 2D Markov process, irrespective of the underlying signal dimension. Furthermore, in the large systems limit (i.e., as the signal dimension tends to infinity), the random sample paths of the 2D Markov process converge to the solutions of two deterministic and coupled ordinary differential equations (ODEs). Numerical simulations verify the accuracy of our analytical predictions, even for moderate system sizes. This suggests that the ODE approach presented in this work provides an effective tool for analyzing the performance and convergence of the algorithm.
Suppose we have a time series of observations for each node in a network, and we wish to detect the presence of a particle undergoing a random walk on the network. If there is no such particle, then we observe only zero-mean Gaussian noise. If present, however, the location of the particle has an elevated mean. How well can we detect the particle at low signal-to-noise ratios? This is a special case of the problem of detecting hidden Markov processes (HMPs).
The performance metric we analyze is the error exponent of the optimal detector, which measures the exponential rate of decay in the miss probability if the false alarm probability is held fixed as the observation time increases. This problem exhibits deep connections to a problem in statistical physics: computing the free energy density of a spin glass.
We develop a generalized version of the random energy model (REM) spin glass, whose free energy density provides a lower bound for our error exponent, and compute the bound using large deviations techniques. The bound closely matches empirical results in numerical experiments, and suggests a phase transition phenomenon: below a threshold SNR, the error exponent is nearly constant and near zero, indicating poor performance; above the threshold, there is rapid improvement in performance as the SNR increases. The location of the phase transition depends on the entropy rate of the Markov process.
Recent advances in materials, devices and fabrication technologies have motivated a strong momentum in developing solid-state sensors that can detect individual photons in space and time. It has been envisioned that such sensors can eventually achieve very high spatial resolutions (e.g., 10^9 pixels/chip) as well as high frame rates (e.g., 10^6 frames/sec). In this paper, we present an efficient algorithm to reconstruct images from the massive binary bit-streams generated by these sensors. Based on the concept of alternating direction method of multipliers (ADMM), we transform the computationally intensive optimization problem into a sequence of subproblems, each of which has efficient implementations in the form of polyphase-domain filtering or pixel-wise nonlinear mappings. Moreover, we reformulate the original maximum likelihood estimation as maximum a posterior estimation by introducing a total variation prior. Numerical results demonstrate the strong performance of the proposed method, which achieves several dB’s of improvement in PSNR and requires a shorter runtime as compared to standard gradient-based approaches.
The Kaczmarz method, or the algebraic reconstruction technique (ART), is a popular method for solving large-scale overdetermined systems of equations. Recently, Strohmer et al. proposed the randomized Kaczmarz algorithm, an improvement that guarantees exponential convergence to the solution. This has spurred much interest in the algorithm and its extensions. We provide in this paper an exact formula for the mean squared error (MSE) in the value reconstructed by the algorithm. We also compute the exponential decay rate of the MSE, which we call the “annealed” error exponent. We show that the typical performance of the algorithm is far better than the average performance. We define the “quenched” error exponent to characterize the typical performance. This is far harder to compute than the annealed error exponent, but we provide an approximation that matches empirical results. We also explore optimizing the algorithm’s row-selection probabilities to speed up the algorithm’s convergence.
(This paper received the Best Student Paper Award of GlobalSIP)
J. Oñativia, Y. M. Lu, and P. L. Dragotti, “Finite Dimensional FRI,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014.Abstract
Traditional Finite Rate of Innovation (FRI) theory has considered the problem of sampling continuous-time signals. This framework can be naturally extended to the case where the input is a discrete-time signal. Here we present a novel approach which uses both the traditional FRI sampling scheme, based on the annihilating filter method, and the fact that in this new setup the null space of the problem to be solved is finite dimensional.
In the noiseless scenario, we show that this new approach is able to perfectly recover the original signal at the critical sampling rate. We also present simulation results in the noisy scenario where this new approach improves performances in terms of the mean squared error (MSE) of the reconstructed signal when compared to the canonical FRI algorithms and compressed sensing (CS).
We consider the problem of estimating an unknown parameter from a finite collection of different statistical experiments. The measurements are taken sequentially. Based on the observations made so far, we adaptively select the next experiment that provides the most information about the parameter. Summarizing past information with finite memory, we present a general framework for efficient adaptive estimation, with the sensing schemes fully characterized by finite-state parametric Markov chains. We establish an analytic formula linking the asymptotic performance of adaptive estimation schemes to the
steady-state distributions of the associated Markov chains. Consequently, finding optimal adaptive strategies can be reformulated as the problem of designing a (continuous) family of Markov chains with prescribed steady-state distributions.We also propose a quantitative design criterion for optimal sensing policies based on minimax ratio regret.