Publications by Year: 2021

2021
Y. M. Lu, “Householder Dice: A Matrix-Free Algorithm for Simulating Dynamics on Gaussian and Random Orthogonal Ensembles,” IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 8264-8272, 2021. arXiv:2101.07464 [cs.IT]Abstract
This paper proposes a new algorithm, named Householder Dice (HD), for simulating dynamics on dense random matrix ensembles with translation-invariant properties. Examples include the Gaussian ensemble, the Haar-distributed random orthogonal ensemble, and their complex-valued counterparts. A "direct" approach to the simulation, where one first generates a dense n×n matrix from the ensemble, requires at least O(n2) resource in space and time. The HD algorithm overcomes this O(n2) bottleneck by using the principle of deferred decisions: rather than fixing the entire random matrix in advance, it lets the randomness unfold with the dynamics. At the heart of this matrix-free algorithm is an adaptive and recursive construction of (random) Householder reflectors. These orthogonal transformations exploit the group symmetry of the matrix ensembles, while simultaneously maintaining the statistical correlations induced by the dynamics. The memory and computation costs of the HD algorithm are O(nT) and O(nT2), respectively, with T being the number of iterations. When T≪n, which is nearly always the case in practice, the new algorithm leads to significant reductions in runtime and memory footprint. Numerical results demonstrate the promise of the HD algorithm as a new computational tool in the study of high-dimensional random systems.
O. Dhifallah and Y. M. Lu, “A Precise Performance Analysis of Learning with Random Features,” Technical report, 2021. arXiv:2008.11904 [cs.IT]Abstract
We study the problem of learning an unknown function using random feature models. Our main contribution is an exact asymptotic analysis of such learning problems with Gaussian data. Under mild regularity conditions for the feature matrix, we provide an exact characterization of the asymptotic training and generalization errors, valid in both the under-parameterized and over-parameterized regimes. The analysis presented in this paper holds for general families of feature matrices, activation functions, and convex loss functions. Numerical results validate our theoretical predictions, showing that our asymptotic findings are in excellent agreement with the actual performance of the considered learning problem, even in moderate dimensions. Moreover, they reveal an important role played by the regularization, the loss function and the activation function in the mitigation of the "double descent phenomenon" in learning.
A. Maillard, F. Krzakala, Y. M. Lu, and L. Zdeborova, “Construction of optimal spectral methods in phase retrieval,” in Mathematical and Scientific Machine Learning, 2021. arXiv:2012.04524 [cs.IT]Abstract
We consider the phase retrieval problem, in which the observer wishes to recover a n-dimensional real or complex signal X⋆ from the (possibly noisy) observation of |ΦX⋆|, in which Φ is a matrix of size m×n. We consider a \emph{high-dimensional} setting where n,m→∞ with m/n=O(1), and a large class of (possibly correlated) random matrices Φ and observation channels. Spectral methods are a powerful tool to obtain approximate observations of the signal X⋆ which can be then used as initialization for a subsequent algorithm, at a low computational cost. In this paper, we extend and unify previous results and approaches on spectral methods for the phase retrieval problem. More precisely, we combine the linearization of message-passing algorithms and the analysis of the \emph{Bethe Hessian}, a classical tool of statistical physics. Using this toolbox, we show how to derive optimal spectral methods for arbitrary channel noise and right-unitarily invariant matrix Φ, in an automated manner (i.e. with no optimization over any hyperparameter or preprocessing function).
O. Dhifallah and Y. M. Lu, “On the Inherent Regularization Effects of Noise Injection During Training,” in International Conference on Machine Learning (ICML), 2021. arXiv:2102.07379 [cs.LG]Abstract
Randomly perturbing networks during the training process is a commonly used approach to improving generalization performance. In this paper, we present a theoretical study of one particular way of random perturbation, which corresponds to injecting artificial noise to the training data. We provide a precise asymptotic characterization of the training and generalization errors of such randomly perturbed learning problems on a random feature model. Our analysis shows that Gaussian noise injection in the training process is equivalent to introducing a weighted ridge regularization, when the number of noise injections tends to infinity. The explicit form of the regularization is also given. Numerical results corroborate our asymptotic predictions, showing that they are accurate even in moderate problem dimensions. Our theoretical predictions are based on a new correlated Gaussian equivalence conjecture that generalizes recent results in the study of random feature models.
O. Dhifallah and Y. M. Lu, “Phase Transitions in Transfer Learning for High-Dimensional Perceptrons,” Entropy, Special Issue "The Role of Signal Processing and Information Theory in Modern Machine Learning", vol. 23, no. 4, 2021. arXiv:2101.01918 [cs.LG]Abstract
Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to the so-called negative transfer phenomenon, where the transferred source information actually reduces the generalization performance of the target task. This happens when the two tasks are sufficiently dissimilar. In this paper, we present a theoretical analysis of transfer learning by studying a pair of related perceptron learning tasks. Despite the simplicity of our model, it reproduces several key phenomena observed in practice. Specifically, our asymptotic analysis reveals a phase transition from negative transfer to positive transfer as the similarity of the two tasks moves past a well-defined threshold.