Quoting Thariq Shihipar
Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost. […]
At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.
Tags: prompt-engineering, anthropic, claude-code, ai-agents, generative-ai, ai, llms
Like
0
Liked
Liked