Two Ways to Build a Skill Server for Your AI Agent
How I stopped fighting context window limits and started giving my agents deeper, richer tool knowledge — on demand.
The Problem with Loading Everything at Startup
If you’ve built AI agents with more than a handful of tools, you’ve probably noticed something: the more tools you register, the worse the agent gets at choosing the right one.
This isn’t a model limitation. It’s an architecture problem.
When you register a tool in any agent framework — LlamaIndex, LangChain, Microsoft Agent Framework, or through MCP — its name, description, and parameter schema get injected into the system prompt. A properly documented tool can easily take 200–500 tokens. With 30 or 40 tools loaded, that’s 10,000–15,000 tokens of context dedicated to tool descriptions alone, before the conversation even starts.
But the real cost isn’t just tokens. It’s the quality of those descriptions.
When you know every tool description competes for space in the context window, you start writing shorter descriptions. You trim the guidelines. You skip the usage examples. You remove the “when NOT to use this tool” section. You make each description as compact as possible to fit everything in.
And that’s where things go wrong. Because the LLM relies entirely on those descriptions to decide which tool to call and how to call it. Shorter descriptions mean more ambiguity, more wrong tool calls, and more wasted iterations.
I ran into this building Wasaphi, my trading agent. I had financial data tools, Reddit sentiment tools, scheduling tools, email tools — all loaded at startup. To keep the prompt manageable, I’d trimmed every description to the minimum. The result: the agent would confuse get_stock_price with get_historical_prices, call sentiment tools when it didn’t need to, and burn through its iteration budget before completing the actual analysis.
The solution wasn’t better descriptions or fewer tools. It was changing when and how tools get loaded.
What Is a Skill Server?
A skill server is a layer between your agent and its tools that manages which capabilities are available at any given moment.
Instead of loading every tool at startup, the agent starts with a minimal set of meta-tools — essentially a catalog of available skills. When a task requires a specific capability, the agent loads that skill on demand, gaining access to its tools along with rich context about how to use them. When the task is done, it can unload the skill to free up context space.
If you’ve seen The Matrix, it’s exactly the scene where Neo gets kung fu uploaded into his brain. The agent doesn’t permanently know everything — it knows how to learn anything, on demand.

The key insight is this: because skills are loaded only when needed, you can afford to make each tool description as detailed as you want. You’re no longer optimizing for minimal token usage across 40 tools. You’re writing comprehensive documentation for 5 tools at a time — with usage guidelines, edge cases, expected outputs, and explicit instructions on when to use or avoid each tool.
This is the real advantage. Not just saving tokens, but investing them where they matter most.
The Architecture in Practice
The agent starts with three meta-tools:
- list_available_skills() — browse what capabilities exist
- load_skill(skill_id) — load a specific skill with its tools and context
- unload_skill(skill_id) — remove a skill from active context
Everything else is loaded dynamically.
Compare that to loading all 40 tools at startup with trimmed descriptions: you’d use more tokens and give the agent less useful information about each tool.
Approach 1: In-Process Skill Registry (The Simple Way)
This approach requires no external infrastructure. The skill catalog lives inside your agent’s codebase. It’s what I’d recommend for most projects, especially if you’re working with a single agent or a small set of agents.
Defining Skills:
Each skill is a dictionary containing its tools and — crucially — a detailed context prompt that gets injected when the skill is loaded.
Here is an example of what could look like a skill metadata:
{"email_reports": {
"name": "Email Reports",
"short": "Generate and send formatted HTML email reports",
"tools": ["generate_html_report", "send_email"],
"context": """
You now have access to email reporting tools.
TOOLS:
1. generate_html_report(title: str, sections: list[dict]) → str
Returns: Complete HTML string ready for email delivery.
Section format: [{"heading": "...", "content": "...", "type": "text|table|chart"}]
Use when: User requests a report, summary, or formatted output delivered via email.
Guidelines:
- Keep reports mobile-friendly (single column, readable fonts)
- Include a header with report title and generation date
- Add disclaimers for financial content
- Tables should have alternating row colors for readability
2. send_email(to: str, subject: str, html_body: str) → str
Returns: Delivery confirmation with message ID.
Use when: Report is generated and ready to send.
Important: Always generate the report first, then send. Never send empty or incomplete reports.
Limitation: Can only send to pre-configured recipients for security.
GENERAL GUIDELINES:
- Always confirm the recipient and content with the user before sending.
- For recurring reports, suggest using the scheduling skill instead of manual sends.
""",
}
Notice the difference with typical tool descriptions. Each tool gets:
- A clear explanation of what it returns
- Explicit “use when” and “do NOT use when” guidelines
- Interpretation guides for the data (what does a short ratio > 40% mean?)
- Workflow tips (always call X before Y)
- Known limitations and caveats
This level of detail would be impossible if all 20+ tools were loaded simultaneously. But when you’re loading 4–5 tools at a time, you have the token budget to be thorough — and the agent performs dramatically better because of it.
The Meta-Tools:
# Active skills tracking
active_skills: set[str] = set()
@tool
def list_available_skills() -> str:
"""
List all skills the agent can load.
Use this to discover capabilities before attempting
a task you don't have tools for.
"""
catalog = []
for skill_id, skill in SKILLS.items():
status = "LOADED" if skill_id in active_skills else "available"
catalog.append(
f"- {skill['name']} [{skill_id}] ({status}): {skill['short']}"
)
return "Skills:n" + "n".join(catalog)
@tool
def load_skill(skill_id: str) -> str:
"""
Load a skill to gain access to its tools and usage context.
Call list_available_skills first to see what's available.
"""
if skill_id not in SKILLS:
return f"Unknown skill: {skill_id}. Use list_available_skills to see options."
if skill_id in active_skills:
return f"Skill '{skill_id}' is already loaded."
skill = SKILLS[skill_id]
active_skills.add(skill_id)
# Register the actual tool functions with the agent
register_tools(skill["tools"])
# Return the rich context — this is where the magic happens
return f"Skill '{skill['name']}' loaded.nn{skill['context']}"
@tool
def unload_skill(skill_id: str) -> str:
"""
Unload a skill to free up context space.
Use when switching to a different task domain.
"""
if skill_id not in active_skills:
return f"Skill '{skill_id}' is not loaded."
skill = SKILLS[skill_id]
active_skills.discard(skill_id)
unregister_tools(skill["tools"])
return f"Skill '{skill['name']}' unloaded."
With those tools, the agent loaded exactly what it needed, when it needed it.
One thing worth noting: if you want unload_skill to actually free up context — not just remove the tools from the registry — you’ll need to clean the conversation context and reinject it to the agent, stripping out the skill’s prompt while keeping a trace that the skill was loaded then unloaded. This matters most for long sessions with models that have extensive context windows (1 million tokens) and large skills that inject several thousand tokens each. For shorter conversations, simply removing the tool functions is usually enough.
When this approach works well:
- Single agent or a small number of agents
- You control the full codebase
- Adding skills is infrequent enough that redeployment is acceptable
- You don’t need shared state between agents
When you’ll outgrow it:
- Multiple agents need access to the same skills
- You want to add or update skills without redeploying
- Different agents need different permissions
- You need usage monitoring across agents
Approach 2: External Skill Server (Scalable and Portable)
When you have multiple agents — or when you want your skills to be accessible from any MCP-compatible client — you need to move the skill registry out of the agent and into its own service.
The idea is straightforward: a lightweight API server that manages skill definitions, handles tool execution, and exposes everything through a standard protocol.
The Architecture:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Agent A │ │ Agent B │ │ Claude │
│ (Wasaphi) │ │ (Calendar) │ │ Desktop │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ REST API / MCP Protocol │
│ │ │
└─────────────────┼─────────────────┘
│
┌────────────▼────────────┐
│ Skill Server │
│ │
│ Skill Registry (YAML) │
│ Tool Execution Engine │
│ Caching Layer │
│ Access Control │
│ Usage Tracking │
└─────────────────────────┘
Skill Definitions in YAML:
Skills are defined as configuration files, not code. This means you can add, modify, or remove skills without touching the server itself.
# skills/financial_analysis.yaml
id: financial_analysis
name: Financial Analysis
version: "1.2.0"
description:
short: "Stock data, SEC filings, options chain, short volume"
access:
agents: ["wasaphi", "research_agent"]
rate_limit: 100/hour
cache:
default_ttl: 300 # 5 minutes
tools:
- name: get_stock_price
handler: sources.yahoo_finance.get_price
description: |
Get current stock price, day change, volume, market cap, P/E ratio.
Use when: Quick snapshot of current state.
Do NOT use when: Historical data needed — use get_historical_prices.
Note: Real-time during market hours, 15min delayed after close.
parameters:
ticker:
type: string
required: true
description: "Stock ticker symbol (e.g., AAPL, NVDA)"
- name: get_historical_prices
handler: sources.yahoo_finance.get_history
description: |
OHLCV historical price data for the specified period.
Periods: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, max.
Use when: Trends, performance analysis, support/resistance levels.
Do NOT use when: Only current price needed — use get_stock_price.
Output: Compact table, one row per trading day, chronological.
parameters:
ticker:
type: string
required: true
period:
type: string
default: "1y"
enum: ["1d", "5d", "1mo", "3mo", "6mo", "1y", "2y", "5y", "max"]
# ... additional tools with equally detailed descriptions
context_prompt: |
You now have access to financial analysis tools.
GUIDELINES:
- Start with get_stock_price for a quick snapshot before deeper analysis.
- For complete analysis, combine at least 3 tools.
- Options data is 15-minute delayed — always mention this.
- Present data with proper formatting and disclaimers.
The Server (FastAPI):
# skill_server.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import yaml
from pathlib import Path
app = FastAPI(title="Skill Server")
class SkillRegistry:
def __init__(self, skills_dir: str = "./skills"):
self.skills = {}
self._load_skills(skills_dir)
def _load_skills(self, skills_dir: str):
for path in Path(skills_dir).glob("*.yaml"):
with open(path) as f:
skill = yaml.safe_load(f)
self.skills[skill["id"]] = skill
def list_skills(self, agent_id: str = None) -> list[dict]:
results = []
for skill_id, skill in self.skills.items():
# Check access if agent_id provided
if agent_id and agent_id not in skill.get("access", {}).get("agents", []):
continue
results.append({
"id": skill_id,
"name": skill["name"],
"description": skill["description"]["short"],
"version": skill.get("version", "1.0.0"),
"tool_count": len(skill["tools"])
})
return results
def get_skill(self, skill_id: str) -> dict:
return self.skills.get(skill_id)
registry = SkillRegistry()
@app.get("/skills")
async def list_skills(agent_id: str = None):
"""List available skills, optionally filtered by agent permissions."""
return {"skills": registry.list_skills(agent_id)}
@app.get("/skills/{skill_id}")
async def load_skill(skill_id: str, agent_id: str = None):
"""Load a skill — returns tool schemas and rich context prompt."""
skill = registry.get_skill(skill_id)
if not skill:
raise HTTPException(404, f"Skill '{skill_id}' not found")
# Check access
allowed_agents = skill.get("access", {}).get("agents", [])
if agent_id and allowed_agents and agent_id not in allowed_agents:
raise HTTPException(403, f"Agent '{agent_id}' cannot access this skill")
return {
"skill_id": skill["id"],
"name": skill["name"],
"tools": [
{
"name": t["name"],
"description": t["description"],
"parameters": t.get("parameters", {})
}
for t in skill["tools"]
],
"context": skill.get("context_prompt", ""),
"version": skill.get("version", "1.0.0")
}
@app.post("/skills/{skill_id}/execute")
async def execute_tool(skill_id: str, request: dict):
"""Execute a tool from a loaded skill."""
skill = registry.get_skill(skill_id)
if not skill:
raise HTTPException(404, f"Skill '{skill_id}' not found")
tool_name = request.get("tool_name")
arguments = request.get("arguments", {})
# Find the tool definition
tool_def = next((t for t in skill["tools"] if t["name"] == tool_name), None)
if not tool_def:
raise HTTPException(404, f"Tool '{tool_name}' not found in skill '{skill_id}'")
# Execute the handler
handler = resolve_handler(tool_def["handler"])
result = await handler(**arguments)
return {"result": result}
The Agent-Side Client:
# skill_client.py — How agents connect to the skill server
import httpx
class SkillClient:
def __init__(self, server_url: str, agent_id: str):
self.server_url = server_url
self.agent_id = agent_id
self.loaded_skills = {}
async def list_skills(self) -> list[dict]:
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{self.server_url}/skills",
params={"agent_id": self.agent_id}
)
return resp.json()["skills"]
async def load_skill(self, skill_id: str) -> dict:
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{self.server_url}/skills/{skill_id}",
params={"agent_id": self.agent_id}
)
data = resp.json()
self.loaded_skills[skill_id] = data
return data
async def execute_tool(self, skill_id: str, tool_name: str, **kwargs) -> str:
async with httpx.AsyncClient() as client:
resp = await client.post(
f"{self.server_url}/skills/{skill_id}/execute",
json={
"tool_name": tool_name,
"arguments": kwargs,
"agent_id": self.agent_id
}
)
return resp.json()["result"]
def unload_skill(self, skill_id: str):
self.loaded_skills.pop(skill_id, None)
MCP Wrapper for Universal Compatibility:
To make the skill server accessible from Claude Desktop, Cursor, Kiro, or any MCP-compatible client, wrap it as an MCP server:
# skill_mcp_server.py
from mcp.server import Server
from mcp.types import Tool, TextContent
server = Server("skill-server")
client = SkillClient(server_url="http://localhost:8000", agent_id="mcp_client")
@server.list_tools()
async def list_tools():
"""Expose the 3 meta-tools via MCP protocol."""
return [
Tool(
name="list_available_skills",
description="List all skills this agent can load on demand",
inputSchema={"type": "object", "properties": {}}
),
Tool(
name="load_skill",
description="Load a skill to gain its tools and detailed usage context",
inputSchema={
"type": "object",
"properties": {
"skill_id": {"type": "string", "description": "Skill ID to load"}
},
"required": ["skill_id"]
}
),
Tool(
name="unload_skill",
description="Unload a skill to free context space",
inputSchema={
"type": "object",
"properties": {
"skill_id": {"type": "string", "description": "Skill ID to unload"}
},
"required": ["skill_id"]
}
)
]
@server.call_tool()
async def call_tool(name: str, arguments: dict):
if name == "list_available_skills":
skills = await client.list_skills()
formatted = "n".join([
f"- {s['name']} [{s['id']}]: {s['description']}"
for s in skills
])
return [TextContent(type="text", text=f"Available Skills:n{formatted}")]
elif name == "load_skill":
data = await client.load_skill(arguments["skill_id"])
# Register tools dynamically + return rich context
return [TextContent(type="text", text=data["context"])]
elif name == "unload_skill":
client.unload_skill(arguments["skill_id"])
return [TextContent(type="text", text="Skill unloaded.")]
else:
# Dynamically loaded tool — route to skill server
skill_id = find_skill_for_tool(name)
result = await client.execute_tool(skill_id, name, **arguments)
return [TextContent(type="text", text=result)]
This gives you one skill server that works with every MCP-compatible tool — Claude Desktop, Cursor, Kiro, and any custom agent you build.
What the External Approach Gives You
Beyond the basics:
Centralized caching. If one agent fetched NVDA price data 30 seconds ago, another agent hitting the same skill server gets the cached result. No duplicate API calls.
Access control. The trading agent can access financial skills. The calendar agent cannot. Permissions are explicit in the YAML definition.
Add skills without redeploying agents. Drop a new YAML file in the skills directory. The next list_available_skills call picks it up. No agent restart needed.
Usage monitoring. Track which agents use which skills, how often, and how much it costs. Essential for understanding your API spend and optimizing.
Shared across platforms. The same skill server powers your custom LlamaIndex agent, your Claude Desktop setup, and your Cursor IDE. One source of truth.
Which Approach to Choose
Single agent, prototype, or learning? → Approach 1. Implement it in 20 minutes. No infrastructure.
Multiple agents, team project, or production deployment? → Approach 2. The setup cost is higher, but the shared infrastructure pays for itself quickly.
Hybrid (what I actually do): Start with Approach 1. The skill definitions are the same — only the delivery mechanism changes. When you find yourself copy-pasting skills between agents, it’s time to extract them into a shared server.
What I Learned
Rich descriptions change everything.
The biggest improvement in my agents’ tool selection wasn’t a better model or more sophisticated prompting. It was giving each tool a detailed description with “when to use,” “when NOT to use,” interpretation guidelines, and workflow tips. Skill servers make this economically viable by loading only what’s needed.
The catalog is the architecture.
Before writing server code, list your skills. Group your existing tools by domain. Write the short descriptions and the detailed context prompts. The catalog structure determines how well your agent navigates its capabilities. The code is just plumbing.
Start simple, extract when it hurts.
Approach 1 is not a lesser version of Approach 2. It’s the right starting point. You’ll know when you need the external server — it’s when you catch yourself maintaining the same skill definitions in three different codebases.
What’s Missing From This Article
I didn’t cover:
- Dynamic tool schema validation at the server level
- Skill dependency management (skill A requires skill B)
- Hot-reloading skills without restarting agents
- Usage analytics dashboards
- Multi-tenant skill servers for SaaS products
- Skill versioning strategies and gradual rollouts
If any of these would be useful, let me know in the comments.
What’s Next
I’m integrating the skill server into Wasaphi, and as soon as possible in all my client project. The long-term architecture: one skill server providing capabilities, Graphiti providing shared temporal memory, and a scheduling layer for time-aware execution, and a greatly manage context.
Skills + memory + scheduling — agents that load what they need, remember what they’ve learned, and act when the time is right.
Thanks for reading! I’m Elliott, a Python & Agentic AI consultant and entrepreneur who builds practical AI tools and shares what works in production. I write weekly about the agents I’m building, the tools I’m testing, and the architecture patterns I discover along the way.
If this was useful, I’d appreciate a few clap 👏 and a follow for more articles on AI agent development, MCP tooling, and production architecture.
Have you built something similar, or found a different approach to managing tool sprawl? I’d love to hear about it in the comments.
Two Ways to Build a Skill Server for Your AI Agent was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.