Token Waste: The Silent Tax on Every AI Team

digitado ⋅ 25 de May de 2026

Token Waste: The Silent Tax on Every AI Tools

ChatGPT, Claude, Gemini — all three charge per token. All three are silently inflated by how most people write prompts. Here’s the research, the real cost, and a free tool that fixes it.

What every major AI charges per million tokens:

*Sources: Anthropic docs, Google Gemini API pricing page, published pricing guides. Prices as of May 2026.*

The AI pricing model is deceptively simple: you pay per token. What most engineers don’t realize is that a significant portion of the tokens they’re paying for carry zero informational value. They’re filler. Hedging. Politeness. Context the model doesn’t need. Format instructions that were never given, so the model guesses and generates twice as much output as necessary.

This isn’t a theoretical problem. OpenAI’s own CEO confirmed it publicly. A peer-reviewed research paper quantified it. And across every major AI platform — ChatGPT, Claude, Gemini — the same structural waste compounds silently in every production pipeline.

The evidence: all three platforms, real numbers

At the Stripe Sessions 2024 conference, someone asked Sam Altman how much it costs OpenAI when users say “please” and “thank you” to ChatGPT. His answer: “tens of millions of dollars” in compute costs. [1] He called it “money well spent.” But the number reveals something important — even the most benign form of prompt inefficiency, at billions of queries a day, becomes a material cost.

A peer-reviewed arXiv paper went further. [2] The researchers found that polite phrasing doesn’t just add input tokens — it systematically inflates output tokens, because the model mirrors the tone of the input. Output tokens cost more than input tokens on every platform. The study estimated this linguistic effect alone generates up to $11 million per month in extra revenue for OpenAI — coming directly from users’ bills.

“Every time you say ‘please’ to ChatGPT — that’s like a penny or something. When you compound that over billions of users, it’s tens of millions of dollars of compute.”

— Sam Altman, CEO of OpenAI, Stripe Sessions 2024 [1]

Claude users face the same dynamic, with higher stakes. Claude Sonnet output tokens cost $15 per million — five times what GPT-4o charges for input. [3] Anthropic’s own documentation for Claude Code estimates average developer costs at $100–200 per month, with “large variance depending on how many instances users are running.” [3] That variance is almost entirely driven by how prompts are written.

Gemini adds another wrinkle. Gemini 2.5 Pro doubles its pricing above 200,000 tokens — from $1.25 to $2.50 per million input tokens, and $10 to $15 for output. [4] Context bloat, one of the most common prompting mistakes, triggers that cliff automatically. Teams injecting full files instead of relevant excerpts cross it without knowing.

The compounding problem at scale

Politeness is the headline case because it’s relatable. But it’s actually one of the smaller waste drivers. The bigger ones — vague scope, missing format instructions, over-injected files — generate far more token waste per call, and they affect output tokens, not just input.

Consider a team running 10 million API calls per day on Claude Sonnet. Average prompt: 200 tokens. If 40% is unnecessary filler, that’s 80 wasted tokens per call. The math:

200 tokens × 40% waste = 80 wasted tokens per call

10M calls/day × 80 tokens = 800M wasted tokens/day

800M tokens × $3/1M = $2,400/day wasted on input alone

→ $876,000 per year, one team, illustrative example

Well-structured prompts consistently reduce token usage by 40–70% without any change in output quality, according to multiple independent studies. [5,6]

The deeper problem is invisible feedback. Engineers send prompts, get responses, and move on. There is no signal telling them whether their prompt was efficient. Only 51% of organizations can confidently evaluate the ROI of their AI spend, per CloudZero’s 2025 State of AI Costs report. [7] Waste accumulates because it’s never visible.

prompt-coach: the feedback loop that was missing

I built prompt-coach as a Claude skill to close exactly this gap. It silently analyzes every prompt you send, scores it across five dimensions grounded in Anthropic’s prompt engineering framework, and appends a one-line coaching note after every response — without interrupting your answer. No commands. No setup. It runs on every message automatically.

GitHub: prompt-coach — Open source · MIT license

After every response, you see one line like this:

One line. The exact waste. The exact fix. The dimension you violated. Over sessions, the patterns that inflate bills at scale stop being invisible defaults and become conscious choices.

The 5 principles it scores — with real diffs

Each prompt is scored across five dimensions (20 points each): Clarity, Concision, Context, Structure, and Specificity — mapped to Anthropic’s official prompt engineering framework. [8] The same principles apply across ChatGPT, Claude, and Gemini.

01 — Clarity: Start with an imperative verb

02 — Specificity: Scope your output format

03 — Context: Inject only what changes the answer

04 — Structure: Use XML tags for multi-part prompts

05 — Specificity: State done criteria upfront

The Live Dashboard

Type show dashboard at any point and prompt-coach renders a full interactive session breakdown — score trend across every prompt, token used vs optimal, your top recurring issues, and a PE scorecard across all five dimensions.

The dashboard pulls real data from your session. Every number is calculated from your actual prompts — not estimates. It shows you exactly where your tokens are going, which habits are costing the most, and how your score is trending across the conversation.

Install prompt-coach in 60 seconds

prompt-coach is a Claude skill. Install it once in a Project and it coaches every conversation automatically. It’s open source and free.

# Claude.ai (recommended)

1. Projects → New Project

2. Paste SKILL.md contents into Project Instructions

3. Every conversation in that project is coached automatically

# Claude Code

unzip prompt-coach.zip -d ~/.claude/skills/

# Restart Claude Code — loads automatically

Type show dashboard for a full session breakdown: score trend, top issues, and a PE scorecard. Type ? after any coaching line for a full rewrite with token counts explained.

The bottom line

Token waste is not a ChatGPT problem. It’s not a Claude problem. It’s not a Gemini problem. It’s a prompting problem — and it compounds the same way across every platform that charges per token. The fix isn’t switching models. It’s learning to write better prompts, and getting a feedback loop that makes the waste visible every time it happens.

That’s what prompt-coach does. One line after every response. No interruption. No configuration. Free.

References

[1] Sam Altman on polite prompts costing OpenAI “tens of millions of dollars” — Stripe Sessions 2024. Reported by LiveNOW from FOX, The Express Tribune, April 2025.

[2] Cost Transparency of Enterprise AI Adoption — arXiv:2511.11761, November 2024. Peer-reviewed study on polite prompts inflating output tokens; estimates up to $11M/month in additional OpenAI revenue.

[3] Claude Code Costs — Anthropic Documentation. Primary source for Claude Sonnet pricing and Claude Code average developer costs.

[4] Gemini Developer API Pricing — Google. Primary source for Gemini 2.5 Pro pricing and the 200K token billing cliff.

[5] Optimizing Prompts to Reduce Token Usage and Costs — InventiveHQ, January 2026. Documents 40–70% token reduction from optimized prompts.

[6] Semantic Prompt Engineering: Cut AI Token Waste 60–74% — CostLayer, April 2026.

[7] The State of AI Costs 2025 — CloudZero, May 2025. Average monthly enterprise AI spend $62,964 in 2024; only 51% can evaluate ROI

[8] Anthropic Prompt Engineering Overview — Official Anthropic documentation.

Token Waste: The Silent Tax on Every AI Team was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Like 0

Liked Liked