State Drift in Language-Conditioned Autonomous Agents: A Failure Mode of Long-Horizon Reasoning
Language-conditioned autonomous agents rely on natural language to represent internal state, reason about goals, and select actions. Despite recent advances in reasoning and planning, such agents remain unreliable in long-horizon tasks. In this work, we identify state drift as a fundamental and underexplored failure mode, characterized by persistent divergence between an agent’s internal textual state and the true environment state over time. We study state drift through controlled experiments with language-driven agents operating in long-horizon settings. By comparing fact-level internal belief representations against ground-truth environment states across sequential interactions, we show that state drift can arise and persist even when individual reasoning steps are locally coherent and logically valid. This indicates that long-horizon failures cannot be explained solely by step-wise reasoning errors. Moreover, we find that increasing context capacity does not mitigate state drift in deterministic environments, suggesting that the phenomenon is not simply a consequence of limited memory or forgetting. Instead, our results point to a structural limitation of using natural language as an internal state representation. Ensuring semantic state consistency over extended horizons thus emerges as a distinct and unresolved challenge for language-conditioned autonomy, with important implications for the design and evaluation of reliable autonomous agents.