M
MoneyMath

LLM API Cost Calculator (GPT, Claude, Gemini, DeepSeek)

Estimate your monthly LLM bill across every major provider. Includes prompt caching discount and batch API savings. Picks the cheapest model for your workload.

๐ŸŸข Updated April 2026๐Ÿ‘ค Reviewed by MoneyMath Editorialโšก Runs in your browser ยท no data sent

Cached prompt prefixes get 50% off typical. Higher cache = lower cost.

Cheapest Model
Gemini 2.5 Flash
$2.46 / month at 10,000 calls
Cost per call (cheapest)$0.00
Most expensive optionClaude Opus 4 โ€” $566.25
Price spread (cheapest vs most expensive)231ร—
Show the formula
cost per call = (input/1M ร— price_in) + (output/1M ร— price_out)
cache discount: input tokens ร— (1 โˆ’ cache_rate ร— 0.5)
batch API: total ร— 0.5

Full monthly cost breakdown

Model$/callMonthly cost
โญ Gemini 2.5 Flash$0.00$2
GPT-4o-mini$0.00$5
DeepSeek V3$0.00$9
Claude Haiku 4$0.00$30
Gemini 2.5 Pro$0.00$41
GPT-4o$0.01$82
Claude Sonnet 4$0.01$113
Claude Opus 4$0.06$566

How to actually lower your bill

  • Prompt caching (Anthropic): 90% off for cache hits. Structure: static system prompt first, dynamic user input last.
  • Prompt caching (OpenAI): 50% off for hits. Automatic โ€” no code change needed for calls over 1,024 tokens.
  • Batch API: 50% off across OpenAI, Anthropic. 24-hour turnaround. Perfect for daily report generation, async summarization, evaluation runs.
  • Model laddering: Use cheap model (Haiku/Flash/mini) for triage, expensive (Opus/4o/Pro) only when needed. Often cuts cost 70โ€“90%.
  • Output tokens are 3โ€“5x more expensive than input. Cap response length aggressively.

Frequently Asked Questions

Why is there sometimes a 100x price difference between models?

Frontier models (Opus, GPT-4o) are priced at the expensive capability ceiling. Budget-tier models (Haiku, GPT-4o-mini, Flash, DeepSeek) share infrastructure with smaller parameter counts. For many tasks โ€” classification, simple extraction, routing โ€” the cheap models match frontier quality at 5โ€“100x less cost.

Do these prices include the free tier?

No. Most providers offer $5โ€“$100 in free credits for new accounts, but this calculator shows per-token pricing only. Factor in free tier for your first month's real-world cost.

What about self-hosted open-source models?

Llama 3, Qwen 2.5, Mistral are free for the model weights but you pay for GPU compute. Typical rule of thumb: self-hosting breaks even vs API at ~10Mโ€“50M tokens/day depending on model size. Below that, hosted APIs win.

Does this include embedding costs?

No โ€” only chat completion / generation. Embedding models (text-embedding-3-small, text-embedding-3-large, Voyage, Cohere Embed) are priced separately at $0.02โ€“$0.13 per 1M tokens.