Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance

arXiv:2409.20250v2 Announce Type: replace
Abstract: Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane. Below the boundary, the RFM collapses to a linear surrogate and can underperform strong linear baselines; above it, higher-order terms persist and the RFM achieves a clear nonlinear advantage. Numerical simulations and real-data experiments corroborate the theory and delineate the transition between these two regimes.

Liked Liked