Your Agent Can Learn at Three Layers — Most Teams Only Think About One
When someone says “the AI needs to keep learning,” almost everyone pictures the same thing: retraining the model. New data, updated weights, a fresh checkpoint. But for agents, that instinct quietly throws away two of the three places where learning actually happens — and they’re the two most teams can realistically use.
This framing comes from Harrison Chase (LangChain’s co-founder), and it’s one of those breakdowns that’s obvious in hindsight and clarifying once you have it. The claim: an agentic system can learn at three distinct layers — the model, the harness, and the context — and knowing which layer you’re operating on changes how you build systems that improve over time.
This post walks through all three, what learning looks like at each, the trade-offs, and the thing that quietly powers all of them. If you build agents and “continual learning” still means “fine-tune the model” in your head, this will widen the picture.
First, the three layers
Strip an agentic system down and it has three parts that can change independently:
- Model — the weights themselves.
- Harness — the code that drives the agent, plus any tools and instructions that are always part of it. This is the scaffolding every instance of the agent shares.
- Context — additional instructions, skills, and memory that live outside the harness and configure it for a particular use.
The cleanest way to feel the distinction is to map a real agent onto it. Take Claude Code: the model is Claude Sonnet (or whichever model you point it at), the harness is Claude Code itself, and the context is your CLAUDE.md, your /skills, your mcp.json. Same model, same harness, totally different behavior depending on the context you hand it.
Map it onto OpenClaw and the same shape holds: many possible models, a harness built on Pi plus scaffolding, and context that lives in its SOUL.md and the skills it pulls from clawhub. The layers are universal even when the implementations differ wildly.
Here’s the key move: most people jump straight to the model when they hear “learning.” In reality, the system can learn at all three — and the higher layers are often where the practical wins live.
Layer 1: Learning in the model (powerful, expensive, risky)
This is the one everyone already pictures. You update the weights with supervised fine-tuning, or reinforcement learning like GRPO, so the model itself gets better at your task.
It’s the most powerful lever — you’re changing the raw intelligence — but it comes with the heaviest baggage. The central hazard is catastrophic forgetting: train the model on new tasks and it tends to quietly degrade on things it used to do well. That’s still an open research problem, not a solved engineering step.
In practice, model-layer learning is almost always done for the agent as a whole rather than per user. You could imagine a LoRA adapter per user, but almost nobody does — the cost and operational complexity don’t pencil out. For most teams shipping agents, this layer is something you reach for rarely, if ever.
Layer 2: Learning in the harness (the underrated middle)
This is the layer most people don’t even know is a learning surface, and it’s the most interesting of the three.
The harness is the scaffolding — the loop, the tools, the always-on system instructions. The insight is that this code can be optimized automatically using the agent’s own run history. The pattern, described well in the Meta-Harness work, goes like this:
- Run the agent over a batch of tasks.
- Evaluate the results.
- Dump all the logs to a filesystem.
- Point a coding agent at those logs and have it propose changes to the harness code itself.
Read that again, because it’s a genuinely different idea: a frozen model, with a coding agent rewriting the scaffolding around it to climb an eval score. No weight updates at all — the system gets better by improving its own plumbing. LangChain used exactly this pattern (traces → coding agent → harness changes) to improve its open-source Deep Agents harness on a terminal benchmark.
Like model training, this is usually done at the agent level, though in principle you could learn a different harness per user.
Layer 3: Learning in the context (where most real learning happens today)
Context is everything that sits outside the harness and configures it — instructions, skills, and especially memory. And this is, pragmatically, where most agent “learning” happens in production right now.
What makes this layer so useful is how flexibly it can be scoped. You can learn context at:
- The agent level — the agent keeps a persistent memory and rewrites its own configuration over time. OpenClaw’s evolving SOUL.md is the vivid example.
- The tenant level — each user, team, or org gets its own context that improves over time. This is the more common production pattern, and it’s what products like Hex’s Context Studio, Decagon’s Duet, and Sierra’s Explorer are built around.
And you can mix them — agent-level and user-level and org-level context updates riding on the same base model.
There are two ways the updates actually happen, and the distinction matters:
- Offline / in the background — run over recent traces in a batch job, extract what worked, and fold it into the config. OpenClaw memorably calls this overnight pass “dreaming.”
- In the hot path — the agent updates its memory while it’s working, either on its own initiative or because the user told it to remember something.
The beauty of this layer is that it’s cheap, scopable, and personalizable. One evolving context per company, per team, or per user — no GPU cluster required, no catastrophic forgetting to fear.
The trade-offs, side by side
No layer is strictly best; they trade off along predictable axes:
ModelHarnessContextWhat changesWeightsScaffolding code/toolsInstructions, skills, memoryCost & difficultyHighestMediumLowestMain riskCatastrophic forgettingRegressions in agent codeStale/bloated memoryEasy to personalize?Rarely (per-agent)Usually per-agentYes — per agent/user/orgTypical cadenceInfrequent, offlinePeriodic, offlineReal-time or nightly
The practical takeaway: getting to something that feels like continual learning will involve all three layers, but the higher you go, the cheaper and more scopable the learning gets. Most teams should be doing far more at the harness and context layers than they currently are.
The thing underneath all of it: traces
Here’s the unifying punchline. Every one of these loops — retraining the model, rewriting the harness, evolving the context — is powered by the same artifact: traces, the full execution record of what the agent actually did.
That’s not a coincidence; it’s the deep reason agents are different from traditional software. In classic software, the code is the source of truth — read it and you know what will happen. With an agent, the logic lives in a non-deterministic model, so the only faithful record of what happened is the trace of a real run. Want to train the model? You need traces to learn from. Want to improve the harness? A coding agent reads the traces. Want to evolve context? You extract insights from traces.
Which leads to the conclusion worth carrying out of all this: if traces are what every learning loop consumes, then capturing them well — and having a serious way to evaluate what’s “good” in them — isn’t a nice-to-have. It’s the precondition for your agent learning anything at all, at any layer.
So before you reach for fine-tuning, ask the cheaper question first: am I even capturing the traces that would let me learn at the harness or context layer? For most teams, that’s where the next improvement is hiding.
Harrison Chase’s original framing is on the LangChain blog; the harness-optimization idea is detailed in the Meta-Harness writeup, and OpenClaw’s “dreaming” memory pass is documented here. If you’re running agents in production, I’m curious which layer you’ve actually shipped learning at — my bet is context, with harness as the most underused.
Your Agent Can Learn at Three Layers — Most Teams Only Think About One was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.