Spectral Archaeology: The Causal Topology of Model Evolution

arXiv:2601.03424v1 Announce Type: new
Abstract: Behavioral benchmarks tell us textit{what} a model does, but not textit{how}. We introduce a training-free mechanistic probe using attention-graph spectra. Treating each layer as a token graph, we compute algebraic connectivity ($lambda_2$), smoothness, and spectral entropy. Across 12 models and 10 languages, these measures yield stable “spectral fingerprints” that expose discontinuities missed by standard evaluation.
We report four results. (1) Models undergoing specific curriculum transitions (e.g., code-to-chat) show an English-only, syntax-triggered connectivity failure on non-canonical constructions, reaching $Deltalambda_2 approx -0.76$. We term this scar textit{Passive-Triggered Connectivity Collapse} (PTCC). Analysis of the Phi lineage reveals that PTCC appears and resolves across developmental stages, implicating brittle curriculum shifts rather than synthetic data per se. (2) PTCC reflects a specialization trade-off: strengthened formal routing at the expense of stylistic flexibility. (3) We identify four recurrent processing strategies; simple frozen-threshold rules enable perfect forensic identification across lineages. (4) Mechanistically, PTCC localizes to a sparse Layer 2 “compensatory patch” of heads that fails under syntactic stress; activation steering can partially restore connectivity, recovering $approx 38%$ of lost information flow.
Finally, dominant topological regimes track tokenization density more than language identity, suggesting “healthy” geometry varies systematically across scripts. Overall, attention-graph spectra provide a practical tool for auditing and training-regime verification.

Liked Liked