Comparing Claude Models
The three Claude tiers differ on three axes: speed, cost, and quality. Choosing correctly matters — defaulting to Opus for everything wastes money; defaulting to Haiku for everything sacrifices quality. This page gives you the decision matrix.
The Three Axes
Speed (Latency)
Haiku > Sonnet > Opus. Haiku typically responds 3–5× faster than Opus for the same prompt. In real-time chat or voice pipelines, latency is often the binding constraint.
Cost
Haiku is the cheapest by a significant margin. Running Opus costs roughly 10–20× more per million tokens than Haiku. At scale, this difference is substantial.
Quality
Opus > Sonnet > Haiku. But the gap between Sonnet and Haiku is larger than the gap between Opus and Sonnet for most common tasks. Sonnet gives most of the quality at a fraction of the cost.
Decision Matrix
| Task Type | Recommended Tier | Reason |
|---|---|---|
| Classification, labelling, routing | Haiku | Simple output (a label or short decision), runs millions of times — cost dominates |
| Short extractions (dates, names, entities) | Haiku | Structured, well-defined output; quality ceiling is low — Haiku is sufficient |
| Summarisation (short documents) | Haiku | Straightforward compression task; Haiku handles it well at 1/10th the cost |
| Coding assistance, bug fixing | Sonnet | Needs genuine code understanding; Sonnet is excellent on SWE-bench; Haiku misses edge cases |
| Writing, editing, drafting | Sonnet | Quality of output matters; Sonnet produces noticeably better prose than Haiku |
| Analysis, research synthesis | Sonnet | Multi-step reasoning required; Sonnet handles most analytical tasks reliably |
| Agentic workflows (tool use, multi-step) | Sonnet | Claude Sonnet is the model used by Claude Code; best-in-class agentic reliability |
| Long-document analysis (100K+ tokens) | Sonnet | All tiers share the 200K context window; Sonnet provides better long-context accuracy than Haiku |
| Complex maths, formal proofs | Opus + extended thinking | Requires deep reasoning steps; extended thinking budget helps significantly |
| Deep research synthesis | Opus | Where Opus earns its cost: synthesising complex information across many sources |
| Creative writing, nuanced voice | Opus | Opus produces the most stylistically aware and contextually sensitive long-form output |
Cost Comparison at Scale
| Model | Context | Input ($/1M) | Output ($/1M) | Speed | Best For |
|---|---|---|---|---|---|
| Haiku 4.5 | 200K | $1.00 | $5.00 | Fastest | Classification, routing, extraction, customer-facing chat |
| Sonnet 4.6 | 1M | $3.00 | $15.00 | Fast | Coding, writing, analysis, agents — default production choice |
| Opus 4.6 | 1M | $5.00 | $25.00 | Slower | Hardest reasoning, research synthesis, maximum quality |
Anthropic publishes per-token pricing at anthropic.com/pricing. The relative cost ratios across the current generation are roughly:
1×
Haiku 4.5
Baseline cost
~5–8×
Sonnet 4.6
vs Haiku
~15–20×
Opus 4.6
vs Haiku
For a pipeline processing 10 million tokens per day: running on Haiku vs Sonnet saves roughly $100–150/day at scale. Check current rates before budgeting — pricing has consistently fallen across model generations.
Quality on Key Benchmarks
Anthropic publishes benchmark results for each model. Key dimensions where tier selection matters most:
- SWE-bench (coding): Sonnet is Anthropic's benchmark leader here; even Haiku performs reasonably on simple coding tasks. Opus adds headroom on the hardest bugs.
- MMLU (knowledge): All current models perform well. Sonnet and Opus are close on straightforward knowledge retrieval; gaps emerge on multi-step reasoning chains.
- GSM8K / MATH (maths): Extended thinking on Opus closes the gap to frontier reasoning models; standard Sonnet is solid for most applied maths.
- Long-context recall: Claude consistently outperforms same-tier competitors on "needle in a haystack" retrieval from large contexts — this is a cross-tier strength.
A Practical Approach
When building a new feature or pipeline, use this sequence:
- Prototype with Sonnet. It's fast enough to iterate quickly and capable enough to validate your approach without hitting Haiku's quality ceiling.
- If quality is fine, try Haiku. Run a batch of representative prompts through Haiku and compare outputs. If quality holds, switch and save the cost.
- Only escalate to Opus if Sonnet consistently fails on your hardest test cases and the task is genuinely worth the 5–20× cost premium.
- Consider task decomposition before escalating. Often a complex task that "needs Opus" can be broken into subtasks where Sonnet handles most of the work, Opus handles only the hard step, and Haiku handles the simple steps — at an overall cost closer to Sonnet.
Checklist: Do You Understand This?
- The three axes for model selection are speed, cost, and quality — they trade off against each other
- Haiku for classification, routing, extraction, and high-volume simple tasks
- Sonnet for coding, writing, analysis, agentic workflows — the default for most production workloads
- Opus for the hardest reasoning tasks and research synthesis where quality matters most
- Opus costs roughly 15–20× more than Haiku; Sonnet is 5–8× Haiku
- Prototype with Sonnet, validate with Haiku, escalate to Opus only when needed