Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype
Author(s): Andrii Tkachuk Originally published on Towards AI. Over the past year, memory has become one of the most overused — and misunderstood — concepts in AI agent design. But before I start, I want to add a few words, most of us building AI agents today didn’t start as “AI engineers”. We come from backend engineering, data engineering, or data science. That background shapes how we think about systems: scalability, reliability, clear lifecycles, and predictable failure modes. And when we bring LLMs and agents into production, we still care about the same things: we don’t want state explosions we don’t want hidden coupling and we definitely don’t want to create systems that make life harder for backend engineers and architects down the line. This article is written from that mindset, not “what sounds impressive in demos”, but what leads to a reasonable trade-off between AI capabilities, backend architecture, and long-term system health. You hear phrases like long-term memory, short-term memory, context engineering, persistent agents, and stateful conversations everywhere. But if you look closely at most real implementations, many teams either: don’t actually use memory at all, or use it in ways that introduce serious scalability and reliability issues. This article aims to cut through the hype and explain, in practical terms, how memory for AI agents actually works, which approaches exist today, and what trade-offs they come with. Photo by dianne clifford on Unsplash Before we start!🦾 If this piece gives you something practical you can take into your own system:👏 leave 50 claps (yes, you can!) — Medium’s algorithm favors this, increasing visibility to others who then discover the article.🔔 Follow me on Medium and LinkedIn for more deep dives into agentic systems, LLM architecture, and production-grade AI engineering. First, Let’s Define the Terms Clearly Long-Term Memory (LTM) Long-term memory is anything that persists across sessions, restarts, and disconnections (includes the agent’s past behaviors and thoughts that need to be retained and recalled over an extended period of time; this often leverages an external vector store accessible through fast and scalable retrieval to provide relevant information for the agent as needed). Typical characteristics: Stored in databases, object storage, or vector stores Survives process restarts Not necessarily injected into the model on every request Common forms of LTM: Full chat history stored in a relational database Events or messages stored in an append-only log Vector embeddings of conversations or summaries User preferences, profiles, or behavioral facts Think of long-term memory as durable knowledge, not working context. Short-Term Memory (STM) / Working Memory Short-term memory (often called working memory or execution state, includes context information about the agent’s current situations; this is typically realized by in-context learning which means it is short and finite due to context window constraints) is: Ephemeral Session-scoped Typically stored in RAM Used during active interaction In practice, what we call “short-term memory” in agents usually combines: conversational state (messages) execution state (tool outputs, intermediate results) control flow metadata Short-term memory exists to reduce overhead and improve reasoning continuity, not to replace persistence. Approach #1 — The Legacy Stateless Approach (Still Very Common) The most widespread approach today is actually stateless. How it works For every user request: Fetch chat history from a persistent data store Truncate or limit it Inject it into the prompt Run the agent Repeat on the next request history = db.load_last_messages(user_id, limit=20)prompt = build_prompt(history, user_message)response = llm(prompt) Pros Extremely simple Easy to reason about No RAM management concerns Works well in serverless environments Cons Database is hit on every request Context is always injected, even when not needed Hard limits must be enforced aggressively Becomes expensive and slow at scale This approach does not use short-term memory at all. Each request is fully independent. Approach #2 — Short-Term Memory via In-Memory State (LangGraph-Style) A more advanced approach introduces explicit short-term memory. This is the model used by frameworks like LangGraph. Core idea Load long-term memory once Keep a mutable state object in RAM Update it as messages arrive Use it throughout the agent flow Dispose of it when the session ends Conceptually: class ChatState(TypedDict): user_id: str messages: list[dict] Typical flow (e.g., with WebSockets or Socket.IO) SocketIO one of the most common and well-known framework for building chat based applications. On connect Load chat history from the database Store it in an in-memory state object On each message Read state from RAM Update messages Run the agent On disconnect Optionally persist summary Remove state from memory Pros No database calls on every message Much faster per interaction Natural conversational continuity Clean separation between LTM and STM Cons (and they are important) RAM usage grows with: number of concurrent users length of conversations Requires: strict size limits trimming or summarization TTL / garbage collection Socket-based systems have edge cases: dropped connections multiple tabs per user missing disconnect events This approach can be production-ready, but only if memory management is treated as a first-class concern. Context Variables: What They Are (and What They Are Not) Many implementations add context variables (for example, ContextVar in Python) to avoid passing state through every function. This is useful — but limited. Context variables: ✔️ Improve code readability ✔️ Allow access to state “from anywhere” in the execution flow ❌ Do NOT persist state across events ❌ Do NOT replace an in-memory store They are an access pattern, not a memory strategy. What context variables are good for Avoiding passing state through dozens of function calls Accessing the current execution state inside deep agent logic Improving code readability state = get_current_state()state[“messages”].append(new_message) What they do not do They do not persist memory across events They do not replace an in-memory store They do not solve session lifecycle problems Context variables are a convenience layer, not a memory system. Approach #3 — Memory as a Tool (The New Emerging Pattern) A newer and increasingly popular approach is Memory as a Tool. Before dismissing this approach as “too complex”, I would strongly recommend trying it at least once. Even if […]