Beginner

Reasoning Models Explained

Since late 2024, a new type of AI model has appeared: reasoning models. They tackle hard problems far better than standard models — but they work differently and are not always the right choice. This page explains what they do and when to use them.

Your Question

Hard math, logic, or analysis

→

Think

Model explores approaches internally

→

Self-Check

Verifies and backtracks

→

Answer

Verified, step-by-step response

Reasoning models think before answering — the thinking is hidden but real, and costs extra tokens

What Makes a Reasoning Model Different

A standard model (like GPT-4o or Claude Sonnet) reads your message and immediately starts generating a response. It's fast but can miss nuance in complex problems.

A reasoning model pauses first. Before giving you an answer, it generates a long internal "thinking" trace — working through the problem step by step, trying different approaches, checking its own logic, and backtracking when it spots a mistake. Only after that thinking process does it produce a final answer.

Standard model

You send a message
Model generates tokens immediately
Response appears (fast)

Good for: most everyday tasks, writing, editing, chat

Reasoning model

You send a message
Model generates internal thinking (many tokens, often hidden)
Model checks and refines its reasoning
Final answer appears (slower)

Good for: hard maths, complex logic, multi-step planning, debugging

The thinking is usually hidden. In ChatGPT, you sometimes see a "Thinking..." indicator. Claude shows a collapsed "thinking" section. DeepSeek-R1 shows the full reasoning trace. The internal thinking can be thousands of tokens long — even if the final answer is short.

Why This Makes Models Much Better at Hard Problems

The improvement on difficult tasks is dramatic. On AIME 2024 (a hard maths olympiad exam), DeepSeek-R1 went from 15.6% to 77.9% accuracy through reinforcement learning that rewarded correct reasoning. OpenAI's o3 scores above 90% on maths benchmarks where standard GPT-4 struggles to reach 50%.

The reasoning process helps because:

Decomposition — Breaking a hard problem into steps makes each step easier to get right
Self-checking — The model can catch its own mistakes mid-thought and correct them before the final answer
Backtracking — If an approach fails, the model can try a different path
More compute on hard parts — The model spends more tokens on the difficult parts of a problem rather than rushing through them

The Major Reasoning Models (2025)

Model	Made by	Access	Thinking visible?	Adjustable?
GPT-5 Thinking	OpenAI	ChatGPT Plus / Pro	Partial (summary)	Yes (effort level)
o3 / o4-mini	OpenAI	API only (retired from ChatGPT UI Feb 2026)	Partial (summary)	Yes (effort level)
DeepSeek-R1	DeepSeek	Free API / download locally	Yes (full trace)	No
Claude + Extended Thinking	Anthropic	Claude.ai Pro / API	Collapsed block	Yes (budget)
Gemini 2.5 Pro (thinking)	Google	Gemini Advanced / API	Partial	Yes (budget)

The Trade-off: Better Answers, But Slower and Costlier

Reasoning models are not always better. The thinking tokens cost money and time. For a simple question like "What is the capital of France?", a reasoning model is overkill — it wastes tokens thinking about something that doesn't need it.

Use a reasoning model when

Hard maths or scientific calculations
Logic puzzles and brain teasers
Complex multi-step planning
Debugging tricky code errors
Analysing a complicated argument or text
The standard model gave a wrong or inconsistent answer

Skip it and use standard model when

Simple factual questions
Writing, editing, brainstorming
Summarising documents
Real-time chat where speed matters
Creative writing
High-volume automated tasks (cost adds up fast)

Adjustable Thinking: Fast vs Thorough

Some reasoning models let you control how much they think. This is sometimes called a "thinking budget" or "reasoning effort level."

Low effort — quicker, cheaper, good for moderately hard problems
High effort — longer thinking, better answers on the hardest problems, but noticeably slower and more expensive

o4-mini on low effort is often a good sweet spot — much smarter than a standard model at a fraction of o3's cost. Start there and upgrade only if needed.

DeepSeek-R1: Open-Weight Reasoning

When DeepSeek-R1 launched in January 2025, it shocked the AI world: it matched frontier reasoning model performance while being open-weight — meaning you can download and run it yourself for free.

Key facts about DeepSeek-R1:

Free to use via DeepSeek's website and API (very cheap API pricing)
Can be downloaded and run locally via Ollama on a capable home GPU
Shows its full reasoning trace (not hidden) — good for learning and debugging
Strong on maths and coding; somewhat weaker on long-form English writing
Context: trained in China, so data privacy and censorship considerations apply for sensitive topics

A Simple Test: Does Thinking Help?

Not sure whether to use a reasoning model? Try this:

Ask a standard model (e.g. GPT-4o or Claude Sonnet) your question
If the answer looks right and you can verify it — done
If the answer seems wrong, inconsistent, or incomplete — try o3, o4-mini, or DeepSeek-R1
If the reasoning model also struggles, the problem may need a different approach (more context, clearer framing, or a specialist tool)

You'll quickly develop an intuition for which types of problems benefit. Anything that involves counting steps, tracking constraints, or doing calculations is a strong candidate.

Checklist: Do You Understand This?

Can you explain in plain words what a reasoning model does differently from a standard model?
Can you name two reasoning models and who makes them?
Name three tasks where a reasoning model is the better choice.
Name three tasks where a standard model is fine and the reasoning model is overkill.
What does "adjustable thinking budget" mean and why does it matter?
What is special about DeepSeek-R1 compared to other reasoning models?