Models in ChatGPT
ChatGPT gives you access to a family of models spanning instant responses, deep reasoning, and long-context document analysis. Understanding the differences between them — and which plan unlocks each — lets you choose the right tool for each task rather than defaulting to the most expensive option.
A Note on Parameters
OpenAI does not publicly disclose parameter counts for any model in the GPT-4 or GPT-5 generation. Reported figures in the press are estimates or leaks and should not be treated as authoritative. What OpenAI does characterise is relative capability, speed, and cost — and those are the dimensions that matter for practical use.
The GPT-5 Family
GPT-5 is the current foundation model generation powering ChatGPT. OpenAI offers it in three tiers, each representing a different compute budget:
GPT-5 Instant
The fastest and most cost-efficient variant. Targets a 128K token context window. Available on Free and Go tiers. Handles everyday tasks — drafting, summarisation, Q&A, light coding — with low latency. Not suitable for tasks requiring deep multi-step reasoning or exhaustive analysis.
GPT-5 Thinking
The extended reasoning variant. Targets a 196K token context window. Available on Plus and above. Before producing its response, the model performs an internal chain-of-thought process — breaking down problems, exploring approaches, and self-checking conclusions. This produces noticeably better results on complex reasoning, maths, legal analysis, and technical writing, at the cost of higher latency and token consumption.
GPT-5 Pro
The maximum compute variant, exclusive to the Pro plan. Used for the hardest research-grade tasks where accuracy and depth matter more than speed. Powers the most demanding Deep Research sessions and the highest-quality outputs across all task types. Not available on any other plan.
o-Series Reasoning Models
The o-series models are architecturally distinct from GPT-5. Rather than generating a response immediately, they allocate additional "thinking time" — producing a long internal chain-of-thought before arriving at an answer. This makes them significantly stronger on formal logic, mathematics, code correctness, and structured analysis, but slower and more expensive per query.
o3
The full reasoning model. 200K context window. Available on Plus and above. Excels at olympiad-level mathematics, complex code debugging, scientific reasoning, and multi-step logical proofs. Uses chain-of-thought extensively — responses may take 30–120 seconds on hard problems.
o4-mini
A faster, cheaper reasoning model. 100K context window. Available on Plus and above. Powers the lightweight Deep Research queries. Still significantly stronger than GPT-5 Instant on reasoning tasks, but with less thinking depth than full o3. Good balance of speed and capability for most reasoning-heavy tasks.
GPT-4o and GPT-4.1
GPT-4o remains available as OpenAI's foundational multimodal model (text, image, audio, vision) with a 128K context window. In ChatGPT it has largely been superseded by GPT-5 Instant for most tasks, but it is still widely used via the API where its pricing ($2.50 per 1M input tokens) is significantly lower than GPT-5.
GPT-4.1 is an API-only model not surfaced in the ChatGPT interface. Its defining characteristic is a 1M token context window, making it the specialist for long-context tasks: entire codebases, lengthy legal documents, book-length manuscripts. Standard pricing, no reasoning overhead, but exceptional at tasks requiring broad document comprehension.
Plan Availability Summary
| Model | Context | Available On | Primary Use |
|---|---|---|---|
| GPT-5 Instant | 128K | Free, Go, Plus, Pro | Everyday tasks, fast responses |
| GPT-5 Thinking | 196K | Plus, Pro, Team, Enterprise | Complex reasoning, detailed analysis |
| GPT-5 Pro | Large | Pro only | Maximum quality, research-grade output |
| o3 | 200K | Plus+ | Formal logic, maths, code correctness |
| o4-mini | 100K | Plus+ | Fast reasoning, Deep Research lightweight |
| GPT-4o | 128K | API (all); ChatGPT (limited) | Multimodal tasks, API cost efficiency |
| GPT-4.1 | 1M | API only | Long-context document analysis |
How Reasoning Models Differ
Standard GPT-5 models respond by predicting the most likely next token given the conversation history — a fast, fluent process. Reasoning models (o3, o4-mini) insert a deliberate thinking phase before generating output. This internal monologue is hidden from users but visible in the API response's reasoning_tokens count. The model may explore multiple solution paths, catch errors in its own reasoning, and revise its approach before producing the final answer. This is why o3 can solve problems that GPT-5 Instant fails on, even though both are "large language models."
The practical implication: use GPT-5 Instant for speed-sensitive tasks (chat, drafting, summarisation), switch to o3 or GPT-5 Thinking when accuracy matters more than latency (debugging, proofs, complex analysis), and reserve GPT-5 Pro for the most demanding research-grade outputs.
Checklist
- What distinguishes GPT-5 Thinking from GPT-5 Instant beyond the context window difference?
- How does the o-series reasoning process differ architecturally from GPT-5?
- Which model is available exclusively on the Pro plan and why?
- What makes GPT-4.1 useful despite being superseded by GPT-5 in most tasks?
- When would you choose o4-mini over full o3 for a reasoning task?