LAI #110: Fixing Context Rot and Rethinking How Agents Reason
Author(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts, This week, we’re looking at why agent systems drift, confuse themselves, or quietly break when tasks get long. I unpack the real cause of “random” agent degradation: context rot, the gradual burying of essential information under noise as conversations, tool calls, and intermediate steps stack up. The curated articles extend that theme of structure and clarity. You’ll find a guide to microservice architecture for ML systems, breaking training, inference, and data pipelines into modular services; a vector-free evaluation method (BrierLM) for models that predict continuous representations instead of tokens; a real-world case study on predicting subway delays using telemetry data; and a hands-on overview of context engineering as the “operating system” of agent performance. We close with a look at recursive language models, systems that decompose tasks into clean, isolated subtasks to escape the limits of traditional context windows. Let’s get into it. What’s AI Weekly This week, in What’s AI, I unpack why so many agents “randomly” degrade over longer tasks and why it’s usually not your prompt or the model. The real culprit is context: as conversations and tool calls pile up, essential details get buried under noise, leading to drift, confusion, and hallucinations (what I call context rot). I walk through what context engineering actually means in practice, and the core techniques that keep systems reliable as they scale, especially retrieval, compaction, and structured memory, so the model sees the right information at the right time. Watch the complete video here! — Louis-François Bouchard, Towards AI Co-founder & Head of Community Learn AI Together Community Section! Featured Community post from the Discord Kiskre. shared APIhub, a platform that offers cheaper API prices for image generation models. It offers predictable flat per-request pricing; you can choose between NanoBanana, NanoBanana Pro, or Imagen 4. Check it out here and see if this is helpful for you. If you have any questions, reach out in the thread! AI poll of the week! Most of you are building multi-agent systems with LangGraph, with a big second group on custom frameworks and a noticeable n8n slice. Determinism, tool-call reliability, and observability are deciding the stack. LangGraph’s graph/state model scratches that itch; custom builds appear where teams need domain-specific guards, cheaper runtime, or lighter deployments than the heavier frameworks. Share one real workflow diagram or snippet (nodes/steps + guards) and the must-have runtime feature that made you choose your stack, e.g., checkpointing, step/budget limits, human-in-the-loop, or tracing. Let’s talk in the thread! Collaboration Opportunities The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week! 1. Loadingusee is building automations with AI and is looking for partners. If you are interested, connect with them in the thread! 2. Skzain_ is learning GenAI with LangChain and is looking for a study partner. If this is your focus for the next few months, reach out to him in the thread! 3. Sakshamgarg08295 is starting to build agents from scratch. If you are interested in pursuing this, contact them in the thread! Meme of the week! Meme shared by bigbuxchungus TAI Curated Section Article of the week Understanding Microservice Architecture for Machine Learning Applications By Faizulkhan This guide explains how to use microservice architecture for machine learning applications. It begins by comparing monolithic and microservices approaches, outlining the benefits of the latter for ML systems. The discussion covers core services, including data ingestion, model training, and inference, as well as communication protocols, such as REST and gRPC. It also clarifies the distinction between stateless and stateful services. To demonstrate these concepts, a hands-on lab is included to build a simple two-service system, providing practical experience with the architecture. Our must-read articles 1. Beyond Perplexity: Evaluating Next-Vector Prediction When Softmax Isn’t an Option By Fabio Yáñez Romero Evaluating language models that predict continuous vectors rather than discrete tokens requires moving beyond traditional metrics such as perplexity. This summary covers BrierLM, a likelihood-free alternative that assesses model quality using only generated samples. Based on the Brier Score, this method evaluates both accuracy against a ground truth and the model’s uncertainty calibration by comparing two independent samples. The approach provides a stable evaluation for sample-based systems and can be extended to n-grams to measure text coherence, offering a practical tool for architectures like CALM or diffusion-based language models. 2. Can You Predict a Subway Delay Before Transit Officials Announce It? By Charlie Taggart This author explores a method for predicting subway delays before official transit announcements. Using 10 months of public MBTA train telemetry, a machine learning model was developed to identify patterns preceding service disruptions. The model, a Random Forest classifier, analyzed metrics like train headways and station dwell times. It successfully predicted official alerts with a median lead time of 35 minutes. This approach proved more reliable than simple heuristics by providing a better balance between accurate warnings and false alarms, suggesting a practical way to give commuters an earlier notice of potential delays. 3. Context Engineering: The Silent Revolution Transforming AI Agents By Mahendra Medapati The performance of AI agents often depends not on the model, but on effective context engineering, the systematic design of information flow. This piece positions context engineering as the operating system for AI, managing how agents store, retrieve, and use information. It outlines key strategies, including memory management, advanced retrieval methods such as RAG, context compression, and task isolation using specialized sub-agents. Additionally, it addresses common failure modes such as context poisoning and drift, while providing practical guidance on performance optimization, notably through KV cache management to improve speed and reduce costs. 4. MIT Just Killed the Context Window By Alok Ranjan Singh To address context rot, the decline in LLM performance with long […]