The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training
arXiv:2603.28964v1 Announce Type: new
Abstract: We develop the spectral edge thesis: phase transitions in neural network training — grokking, capability gains, loss plateaus — are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters $P sim 10^8$, window $W sim 10$), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at position $k^* = mathrm{argmax}, sigma_j/sigma_{j+1}$.
From three axioms we derive: (i) gap dynamics governed by a Dyson-type ODE with curvature asymmetry, damping, and gradient driving; (ii) a spectral loss decomposition linking each mode’s learning contribution to its Davis–Kahan stability coefficient; (iii) the Gap Maximality Principle, showing that $k^*$ is the unique dynamically privileged position — its collapse is the only one that disrupts learning, and it sustains itself through an $alpha$-feedback loop requiring no assumption on the optimizer. The adiabatic parameter $mathcal{A} = |Delta G|_F / (eta, g^2)$ controls circuit stability: $mathcal{A} ll 1$ (plateau), $mathcal{A} sim 1$ (phase transition), $mathcal{A} gg 1$ (forgetting).
Tested across six model families (150K–124M parameters): gap dynamics precede every grokking event (24/24 with weight decay, 0/24 without), the gap position is optimizer-dependent (Muon: $k^*=1$, AdamW: $k^*=2$ on the same model), and 19/20 quantitative predictions are confirmed. The framework is consistent with the edge of stability, Tensor Programs, Dyson Brownian motion, the Lottery Ticket Hypothesis, and neural scaling laws.