Power-Law Spectrum of the Random Feature Model
arXiv:2603.14578v1 Announce Type: new
Abstract: Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x sim N(0,H)in mathbb{R}^v$ where $H$ has $alpha$-power-law spectrum ($lambda_j(H ) asymp j^{-alpha}$, $alpha > 1$), a Gaussian sketch matrix $W in mathbb{R}^{vtimes d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $mathbb{E}_{x }[frac{1}{d}f(W^top x )^{otimes 2}]$. We prove matching upper and lower bounds: for all $1 leq j leq c_1 d log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $left(log^{p-1}(j+1)/jright)^{alpha}$. For $ c_1 d log^{-(p+1)}(d)leq jleq d$, the $j$-th eigenvalue is of order $j^{-alpha}$ up to a polylog factor. That is, the power-law exponent $alpha$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.