Beginner

Choosing the Right Claude Model

Claude comes in three tiers: Haiku (fast and cheap), Sonnet (balanced), and Opus (most capable). Picking the wrong model is the most common way to either overspend or underperform. This page gives you a practical decision framework.

The Three Tiers at a Glance

Haiku — Speed & Scale

Fastest response, lowest cost. Designed for high-volume, lower-stakes tasks where throughput matters more than nuance.

Classification & routing
Extracting structured fields
Simple summarisation
Short Q&A on clear documents
High-volume automation pipelines

Sonnet — Balance & Versatility

The workhorse. Strong reasoning, good instruction-following, and moderate cost. Right for most production use cases.

Code generation & review
Long-form writing & editing
Multi-step reasoning tasks
Agentic tool-use workflows
Customer-facing chat applications

Opus — Capability & Depth

Highest capability, highest cost. Justified when task complexity genuinely requires deeper reasoning or when errors are expensive.

Complex code architecture
Nuanced legal or medical analysis
Long-horizon agentic tasks
Research synthesis across many sources
Tasks where hallucination is costly

Haiku

Fast · Cheap · Simple tasks

Opus

Slow · Expensive · Complex tasks

Haiku

Sonnet

Opus

Step-by-Step Selection Framework

Work through these questions in order. Stop as soon as you reach a definitive answer.

Step 1 — Is the task well-defined and repetitive?

If yes (e.g. extract invoice fields from PDFs, classify support tickets, check for profanity): start with Haiku. Test it. If accuracy is acceptable, ship it — you'll save 5–10× on cost vs Sonnet.

Step 2 — Does the task require multi-step reasoning or tool use?

Agentic tasks (searching the web, running code, calling APIs in sequence), complex coding, or tasks where the model needs to plan before acting: use Sonnet. It handles tool calling reliably and is cost-effective for these mid-complexity workloads.

Step 3 — Are errors expensive or irreversible?

High-stakes tasks where a wrong answer causes real harm (incorrect medical guidance, wrong contract interpretation, a production bug introduced by AI-generated code that bypasses review): consider Opus. Also consider Opus for tasks requiring novel synthesis across complex, long documents.

Step 4 — When in doubt, run an evaluation

If you're unsure whether Haiku is good enough or Sonnet is justified, run both on 50–100 representative inputs and compare output quality. The cost difference between tiers is large enough that even small quality differences often justify stepping up — but only if your use case actually shows degradation at the cheaper tier.

Cost Sensitivity vs Quality Sensitivity

Most teams default to the most capable model out of caution. This is expensive and usually unnecessary. Frame the tradeoff explicitly:

Cost-sensitive scenarios (start cheap)

Processing millions of documents per day
Real-time user interactions at consumer scale
Internal tools where "good enough" outputs are acceptable
Preprocessing steps that feed into a human review stage
Any task where you've measured Haiku accuracy above your threshold

Quality-sensitive scenarios (don't skimp)

Customer-facing outputs with no human review
Legal, medical, or financial advice generation
Complex reasoning chains where errors compound
First impressions (demos, onboarding flows)
Low-volume tasks where per-call cost is immaterial

Tiered Routing in Production

The best production architectures don't pick one model — they route intelligently based on task characteristics:

Complexity routing: Classify the incoming request first (using Haiku or a lightweight classifier), then route simple requests to Haiku and complex ones to Sonnet/Opus. This is sometimes called a "model router."
Fallback escalation: Attempt with Haiku first; if confidence is below a threshold or the task type is detected as complex, escalate to Sonnet. Requires defining a confidence signal (e.g. structured output with a confidence field, or a separate evaluation call).
Task-type switching: Different capabilities within the same application use different models. The summarisation step uses Haiku; the synthesis and recommendation step uses Sonnet; the final risk assessment uses Opus.

When to Run A/B Tests Across Tiers

Run a formal comparison before committing to a model for production if:

You're choosing between Haiku and Sonnet for a task that processes >10,000 calls/day — the cost difference is significant
The task requires judgment (writing quality, nuanced classification) where you can't measure accuracy against ground truth automatically
A stakeholder is questioning whether the cheaper model is "good enough" — empirical data resolves this faster than debate

A minimal A/B test: take 50–100 representative inputs, run both models, have a human (or a separate evaluator model) rate each output blindly on a 1–5 scale. If the cheaper model scores within one point on average, use it.

Model versions change

Haiku, Sonnet, and Opus are tier labels. The underlying model versions (e.g. claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-6) change over time. Always pin to a specific version in production using the dated model ID (e.g. claude-haiku-4-5-20251001) to avoid unexpected behaviour changes when Anthropic releases new versions. See Model Versioning for details.

Checklist: Do You Understand This?

Haiku is for high-volume, well-defined tasks; Sonnet for multi-step reasoning and most production use cases; Opus for genuinely complex, high-stakes tasks
Default to the cheapest tier that meets your quality bar — test before assuming you need the most capable model
Run a 50–100 sample evaluation when choosing between adjacent tiers for high-volume tasks
Production architectures can route different task types to different model tiers within the same application
Pin to specific model version IDs in production code — tier labels like "Sonnet" resolve to different models over time