Compiled Memory: Not More Information, but More Precise Instructions for Language Agents
arXiv:2603.15666v1 Announce Type: new
Abstract: Existing memory systems for language agents address memory management: how to retrieve and page more information within a context budget. We address a complementary problem — memory utility: what experience is worth keeping, and how it should change agent behavior. We present Atlas, a memory kernel that compiles accumulated task experience into an agent’s instruction structure — without fine-tuning, RAG, or human intervention. Memory is distillation, not storage; delivery is instruction rewriting, not context injection. Facts extracted from agent failures and successes are verified through a three-step promotion gate and delivered by rewriting the agent’s system prompt with learned sub-bullets. On CUAD contract analysis, the evolved prompt improves GPT-4o token-level F1 by $+8.7$pp and precision by $+12.5$pp. On HotpotQA multi-hop QA, joint F1 improves $+3.16$pp. An ablation isolates the mechanism’s defining property — the training signal constraint: the evolved prompt learns exactly what it is taught, and nothing more. Applied to Claude Sonnet~4.5 using the same evolved prompt — compiled from GPT-4o errors, unchanged — joint F1 improves $+2.31$pp, with gains concentrating where Claude’s stronger baseline leaves the most room — confirming that the compiled knowledge is task-shaped, not model-shaped.