How to Estimate AI API Cost Before You Send a Prompt

Published March 23, 2026 · Updated March 23, 2026 · 3 min read

The first time an AI feature hits real traffic, the bill can be surprising. Estimating cost before building — not after deploying — gives you the chance to choose a cheaper model, cap output length, or add caching. None of that is possible once users are waiting for responses.

The cost formula

AI API cost is straightforward:

cost = (input_tokens × input_price_per_MTok / 1_000_000)
     + (output_tokens × output_price_per_MTok / 1_000_000)

Providers publish prices per million tokens (MTok). As of early 2026, approximate ranges:

Tier	Examples	Input ($/MTok)	Output ($/MTok)
Budget	GPT-4o mini, Claude Haiku, Gemini Flash	$0.10–$0.40	$0.40–$1.60
Mid-tier	GPT-4o, Claude Sonnet	$2–$5	$8–$15
Frontier	Claude Opus, o3	$15–$30	$60–$120

These change frequently — always verify against the provider's current pricing page.

Per-call cost example

A customer support chatbot using Claude Sonnet: 800-token system prompt + 200-token user message = 1,000 input tokens; 300-token response = 300 output tokens.

input cost  = 1000 / 1_000_000 × $3  = $0.003
output cost = 300  / 1_000_000 × $15 = $0.0045
total       = $0.0075 per conversation

At 500 conversations/day → $3.75/day → $112/month

Switching to Claude Haiku for this task: $0.25/MTok input, $1.25/MTok output → $0.000625 per conversation → ~$9.40/month. A 12x cost reduction — worth evaluating if output quality is comparable.

Estimating output tokens

Input tokens are easy to count precisely. Output tokens require estimation:

Set a max_tokens limit to cap the maximum possible output
Run 20–30 representative prompts through the model and measure actual output lengths
Use the average for typical cost, the 90th percentile for worst-case cost projection

Output tokens are typically priced 3–5x higher than input tokens, so they dominate cost for tasks with long responses (code generation, detailed analysis). For classification tasks with short outputs, input cost dominates.

Prompt caching: Anthropic, OpenAI, and Google all offer prompt caching for repeated context (long system prompts, documents). Cached input tokens cost ~90% less than uncached. For RAG or document-heavy workflows, caching dramatically changes the cost equation.

Monthly cost projection

Per-call cost × calls per day × 30 gives a rough monthly figure. Build in 20–30% headroom for traffic spikes, retries, and token estimation variance. If the monthly projection exceeds your budget, the levers are: choose a cheaper model tier, reduce prompt length, shorten expected output, add caching, or move to batch processing (typically 50% cheaper for async workloads).