Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models
arXiv:2604.13068v1 Announce Type: new Abstract: When do large language models decide to hallucinate? Despite serious consequences in healthcare, law, and finance, few formal answers exist. Recent work shows autoregressive models maintain internal representations distinguishing factual from fictional outputs, but when these representations peak as a function of model scale remains poorly understood. We study the temporal dynamics of hallucination-indicative internal representations across 7 autoregressive transformers (117M–7B parameters) using three fact-based datasets (TriviaQA, Simple Facts, Biography; 552 labeled examples). […]