The Extensibility Triangle That Stopped Me Over-Engineering Claude Code
Forty hours into building an MCP server that served code review prompts, a colleague dropped a markdown file in .claude/skills/ that did the same thing. Ten minutes versus forty hours. Same output.
That experience taught me something that no documentation spells out clearly enough. Claude Code has three distinct extensibility mechanisms, and they have zero overlap once you understand what each one actually does. But more importantly, the design decisions behind these three mechanisms reveal a deliberate architectural philosophy that is worth understanding before you write a single line of configuration.
The Architecture Behind Three Mechanisms
Every extension to Claude Code falls into one of three categories. They exist as separate mechanisms because they operate at fundamentally different layers of the system. This is not accidental. Each one answers a different architectural question.
MCP servers answer: “How does Claude access systems outside its process boundary?” They are separate programs that communicate with Claude via JSON-RPC. The Model Context Protocol defines the wire format (tool calls, resource reads, prompt templates), but the server itself is just a program that speaks JSON over stdio or HTTP. It maintains its own connections, its own state, its own lifecycle. Claude sends a request, the server does the work, and sends back a response. Real inter-process communication with all the implications that carries.
Subagents answer: “How do you run Claude with different behaviour for different tasks?” A subagent is an isolated Claude session with its own system prompt, its own model selection, its own tool access, and critically its own context window. When you invoke a subagent, it starts fresh. No accumulated context from your previous conversation. No tool access beyond what you’ve explicitly granted. The isolation is the feature.
Skills answer: “How do you inject reusable knowledge into a conversation?” A skill is a markdown file that becomes a slash command. When invoked, its contents are injected directly into the current conversation as instructions. No new process. No isolation. No separate context. Just text that shapes Claude’s behaviour within the existing session.
How MCP Servers Actually Work Under the Hood
The Model Context Protocol is a JSON-RPC 2.0 protocol. When Claude Code starts, it launches each configured MCP server as a child process and establishes a bidirectional communication channel over stdio. The server advertises its capabilities (which tools it provides, what resources it exposes) and Claude incorporates these into its available tool set.
When Claude decides to use an MCP tool, the sequence is:
- Claude generates a tool call with a name and JSON arguments
- Claude Code sends this as a JSON-RPC request to the appropriate server process
- The server executes the request (queries a database, calls an API, reads a file system)
- The server returns the result as a JSON-RPC response
- Claude Code feeds the response back into Claude’s context
This is genuine inter-process communication. The server can be written in any language. It can maintain persistent connections to databases, cache results, implement rate limiting, enforce access controls. It is a real program with real state.
{
"mcpServers": {
"analytics": {
"command": "node",
"args": ["./mcp-servers/analytics.js"],
"env": {
"DATABASE_URL": "postgresql://localhost/analytics",
"MAX_ROWS": "1000",
"QUERY_TIMEOUT_MS": "5000"
}
},
"deploy": {
"command": "./mcp-servers/deploy-server",
"args": ["--environment", "staging", "--read-only"],
"env": {
"API_TOKEN": "${DEPLOY_TOKEN}"
}
}
}
}
The MAX_ROWS and QUERY_TIMEOUT_MS environment variables above are not MCP protocol features. They’re application-level safety boundaries that the analytics server enforces internally. This is an important architectural point: the MCP protocol handles communication, but your server handles policy. If Claude asks for every row in a ten-million-row table, the protocol will happily relay that request. Your server needs to say no.
For performance-critical integrations where the server handles thousands of requests or maintains connection pools, Rust is worth the investment. Our guide to building MCP servers in Rust covers the full process from protocol handling to production deployment, including connection pooling and graceful shutdown.
Remote MCP servers over HTTP with SSE follow the same protocol but replace stdio with HTTP transport. This matters for shared team infrastructure where a single MCP server instance serves multiple developers, but it adds network latency and requires authentication. For most teams, local stdio servers are simpler and faster.
How Subagents Work Under the Hood
Subagents are not a separate technology. They are Claude Code sessions with constrained parameters. When you invoke a subagent, Claude Code starts a new conversation with the model specified in the agent’s configuration. The system prompt comes from the markdown body of the agent file. The tool list is filtered to only what the agent configuration allows. And critically, the context window is empty except for the system prompt and the task you’ve given the subagent.
This isolation has profound implications for both cost and quality. Consider a debugging session where you’ve been exploring a codebase for twenty minutes. Your main context window contains dozens of file reads, grep results, stack traces, and conversation turns. If you now ask for a code review in that same session, the review happens in the context of all that debugging noise. The model has to work through accumulated context to focus on the review task.
A subagent starts clean. Its context contains only the system prompt (your review criteria) and the files you’ve asked it to review. No noise. No accumulated state. This is why subagent reviews are often better than main-session reviews, because the model is focused entirely on the task.
---
name: code-reviewer
description: Reviews code for quality, security, and style
model: haiku
tools: Read, Grep, Glob, Bash
disallowedTools: Write, Edit
mcpServers:
- github
maxTurns: 15
---
You are a code review specialist. You have read-only access to the codebase
and the GitHub API.
Review criteria:
- No unwrap() in production code paths (use proper error handling)
- All public functions have doc comments
- Error types implement std::fmt::Display
- No println! in library code (use tracing macros)
- Integration tests exist for new API endpoints
- No TODO without a linked issue number
For each file, provide a pass/fail checklist and specific line references
for any failures. Do not provide general advice — only specific findings.
The model: haiku directive routes this subagent to the cheapest model tier. For pattern-matching work like style checking and convention enforcement, Haiku performs comparably to Opus. There is no point paying Opus pricing for checking whether functions have doc comments.
The disallowedTools: Write, Edit directive creates a hard boundary. Even if the system prompt said “fix any issues you find,” the subagent cannot modify files. This is enforced at the tool-access layer, not the prompt layer. It’s a genuine security boundary.
The maxTurns: 15 prevents runaway subagent sessions. Without this, a subagent analysing a large codebase might iterate for fifty turns, accumulating cost. The turn limit forces the subagent to be concise and prioritise its findings.
The Real Cost Mathematics of Model Routing
This is where subagents pay for themselves and then some. The numbers are worth working through.
Consider a team of five developers, each running roughly ten code reviews per day through Claude Code. Without subagents, each review runs in the main Opus session. With context accumulation from other work, a typical review might consume 15,000-25,000 input tokens and generate 2,000-4,000 output tokens. At Opus pricing, that is not trivial across fifty reviews per day.
With a Haiku subagent, the same review runs in a clean context window. Input tokens drop to 3,000-8,000 (system prompt plus the code being reviewed, no accumulated context). And Haiku’s per-token cost is dramatically lower than Opus. The combined effect of smaller context and cheaper model typically reduces code review costs by 85-95%.
But there’s a subtler cost benefit. Because subagent context windows are isolated and clean, the reviews are faster. Less input context means less processing time. A Haiku subagent review typically returns in 2-5 seconds. The same review in a bloated Opus session might take 10-20 seconds. Over fifty daily reviews, that is meaningful time savings.
The model routing decision is not always obvious though. Some tasks genuinely need Opus-level reasoning. Deep debugging sessions where the model needs to hold complex state across many files. Architectural analysis where the model needs to reason about system-wide implications. Refactoring tasks where the model needs to understand subtle semantic relationships. These should stay on Opus, but in isolated subagents so the context stays clean.
For the broader set of cost strategies, our Claude Code cost optimisation guide covers model routing alongside other approaches like context management and token-aware prompting.
How Skills Work at the Prompt Layer
Skills are architecturally the simplest mechanism, and that simplicity is a feature. A skill is a markdown file in .claude/skills/. The directory name becomes the slash command. When you type /review, Claude Code raeds .claude/skills/review/SKILL.md and injects its contents into the current conversation.
There is no process. No isolation. No separate context window. The skill’s text joins your existing conversation as if you had typed it yourself. This means skills benefit from the conversation’s existing context (Claude already knows what files you’ve been working on) but they also inherit the conversation’s accumulated noise.
# SQL Migration Standards
Review the migration files for compliance with team standards.
## Naming Conventions
- Tables: plural snake_case (user_sessions, not UserSession or user_session)
- Columns: singular snake_case (created_at, not CreatedAt)
- Indexes: idx_{table}_{columns} (idx_user_sessions_user_id)
- Foreign keys: fk_{table}_{referenced_table} (fk_orders_users)
## Safety Requirements
- All CREATE INDEX must use CONCURRENTLY
- ALTER TABLE ADD COLUMN must include a DEFAULT for non-nullable columns
- No DROP COLUMN without a preceding release that stops reading the column
- All migrations must be reversible (provide both up and down)
## Query Patterns
- Use EXISTS instead of IN for subqueries
- Use COALESCE instead of CASE WHEN ... IS NULL
- Avoid SELECT * in application code
- Always specify column lists in INSERT statements
Current schema for reference:
$(cat db/schema.sql)
Migration to review: $ARGUMENTS
The $(cat db/schema.sql) syntax executes a shell command at invocation time and injects the output. This gives skills limited dynamic capability. They can include file contents, environment variables, or command output, but they cannot maintain connections or perform multi-step external operations. The shell command runs once, its output is captured, and that static text becomes part of the prompt.
The $ARGUMENTS variable captures everything after the slash command. /migration db/migrations/20260311_add_sessions.sql would set $ARGUMENTS to the file path, letting you direct the skill to specific files.
Skills are powerful precisely because they are simple. Anyone can write one. For teams where non-developers need to encode their expertise, skills for non-technical teams covers how QA engineers, product managers, and technical writers create effective skills without touching code.
When NOT to Use Each Mechanism (Anti-Patterns)
Understanding what each mechanism does well is only half the picture. Knowing when to avoid each one prevents the mistakes I made early on.
MCP Server Anti-Patterns
Do not build an MCP server to serve static content. If the information doesn’t change between invocations and doesn’t come from an external system, it is a skill. I’ve seen teams build MCP servers that return company coding standards, API documentation, even style guides. All of that is static text. A markdown file does it better, faster, and with zero maintenance.
Do not build an MCP server when a Bash tool call would suffice. Claude Code already has the Bash tool. If your “integration” is running a CLI command and parsing the output, you do not need an MCP server. A skill that instructs Claude to use Bash with specific commands is simpler. Not everything external needs an MCP server. If Claude can run a CLI command via Bash, that is simpler than building and maintaining an MCP server. CLIs (via the Bash tool) are great for local, single-agent workflows.
MCP servers are necessary when you need remote execution, permission scoping, persistent connections, or stateful operations. A developer who needs to check git log or run cargo test does not need an MCP server for that. Those are CLI commands that Claude can run directly. MCP servers earn their complexity when multiple agents or users need shared access, when you need fine-grained permission controls, or when the integration requires maintaining state across multiple requests.
Do not build an MCP server without safety boundaries. An MCP server that executes arbitrary SQL against production is a loaded weapon. Every server should enforce query timeouts, row limits, read-only access where appropriate, and input validation. The protocol won’t protect you. Your server code must.
Subagent Anti-Patterns
Do not create a subagent for a one-off task. Subagents shine for repeated, specialised work. If you need to do something once, just do it in the main session. The overhead of creating and maintaining an agent configuration file is not worth it for ad-hoc work.
Do not route everything to Haiku to save money. Some tasks genuinely require stronger reasoning. I tried running architectural analysis on Haiku once. It identified surface-level issues but missed a circular dependency that Opus caught immediately. Haiku is brilliant for pattern matching and checklist enforcement. It struggles with tasks requiring multi-step reasoning across many files.
Do not create deeply nested subagent chains. A subagent that invokes another subagent that invokes a third is an anti-pattern. Each layer adds latency and obscures the actual work being done. If you need that much orchestration, restructure your approach. Usually the answer is one subagent with access to multiple MCP servers and skills, not a chain of subagents.
Skill Anti-Patterns
Do not write skills that require external data. “Check the production error rate” as a skill instruction will lead Claude to hallucinate metrics. If the task requires data from outside Claude’s session, you need an MCP server (or a subagent with MCP access).
Do not write skills that are too long. A skill’s content is injected into the context window. A 5,000-word skill consumes tokens on every invocation. Keep skills focused. If you need extensive reference material, use $(cat reference.md) to conditionally inject it rather than embedding it directly in the skill.
Do not use skills for tasks that need isolation. A skill runs in the current context. If the current context is polluted with unrelated work, the skill’s effectiveness drops. Tasks that need a clean slate (reviews, analysis, documentation generation) are better served by subagents that start with a fresh context window.
The Composition Model in Production
The three mechanisms are not alternatives. They are layers that compose. Understanding the composition patterns is what separates a working setup from an elegant one.
A subagent can have MCP servers assigned to it. This creates scoped access. A code-reviewer subagent can reach GitHub’s API through the GitHub MCP server but cannot touch your database or deployment pipeline. Least privilege, enforced by configuration.
A subagent can preload skills. The database-analyst subagent loads the sql-standards skill, so every analysis follows your team’s naming conventions and query patterns without the analyst subagent needing those conventions in its system prompt.
And a subagent has tool restrictions independent of its MCP access. You can give a subagent access to the GitHub MCP server (so it can read PRs and comments) while disallowing Write and Edit (so it cannot modify local files). The MCP tools and the built-in tools are controlled separately.
In our production setup across 8 marketplace plugins, the layering looks like this:
Layer 3: Skills (Team Knowledge)
/review — code review checklist
/migration — database migration standards
/deploy-check — pre-deployment verification steps
/sql-standards — SQL naming and query conventions
/api-design — REST endpoint design patterns
Layer 2: Subagents (Specialised Behaviour)
code-reviewer (Haiku, read-only, GitHub MCP, review skill)
debugger (Opus, full access, all MCP servers)
database-analyst (Sonnet, read-only, PostgreSQL MCP, sql-standards skill)
deploy-checker (Sonnet, read-only, deploy + analytics MCP, deploy-check skill)
doc-writer (Haiku, read-only, no MCP, style-guide skill)
Layer 1: MCP Servers (External Connections)
analytics-db — PostgreSQL metrics database
deploy-pipeline — deployment API wrapper
github — GitHub API for PRs and issues
content-api — CMS for documentation publishing
postgresql — application database (read-only)
Each subagent composes exactly the MCP servers and skills it needs, at the model tier that matches its task complexity. The deploy-checker uses Sonnet because it needs to reason about deployment readiness across multiple data sources. The doc-writer uses Haiku because technical writing is primarily pattern matching against a style guide. The debugger uses Opus because debugging genuinely requires the strongest reasoning available.
For the full detail on how these layers work together, our guide on plugins, MCP servers, and skills as a layered architecture walks through the complete composition model. And for the subagent configuration specifically, our custom Claude agent building guide covers every frontmatter field with examples.
The Decision Function
After a year of building production plugins, the decision is a pure function of what you need:
def choose_mechanism(need):
if need.requires_external_data or need.has_side_effects:
return "MCP Server"
if need.requires_different_model or need.requires_tool_restrictions or need.benefits_from_isolation:
return "Subagent"
if need.is_reusable_knowledge or need.is_workflow_template:
return "Skill"
if need.is_complex:
return "Subagent + MCP Server + Skill" # compose all three
If you find yourself building an MCP server that returns static text, stop. That is a skill. If you are writing a skill that says “query the database,” stop. That is an MCP server. If you are running expensive Opus sessions for routine checks, stop. That is a subagent on Haiku.
What Changed for Us
Before understanding the triangle, we had MCP servers doing the work of skills, skills trying to do the work of MCP servers, and no subagents at all. Everything ran in one context window on Opus. Our monthly Claude Code costs were painful and our results were inconsistent.
After restructuring: 34+ skills encode our team conventions across code review, database work, deployment, documentation, and API design. Five subagents handle specialised tasks at the right price point with the right access controls. Four MCP servers connect to exactly the external systems that need connecting, each with proper safety boundaries.
Monthly costs dropped significantly. Review quality improved because subagents start with clean context. Team conventions became consistent because skills enforce them identically for every developer. And the MCP servers became simpler because they stopped trying to be everything; they just bridge to external systems and nothing more.
The extensibility triangle is not a theoretical framework. It’s a practical tool that prevents you from building the wrong thing. Which, given that I built the wrong thing three times before figuring this out, is worth knowing about.