The AI Model Landscape
There are hundreds of AI models and new ones launch every month. This page cuts through the noise and gives you a mental map: who the major players are, what the confusing names mean, and how to pick a model for everyday use.
Model vs Product — Know the Difference
The most common confusion for newcomers: mixing up the model (the AI engine) with the product (the thing you log into).
| Product (what you use) | Underlying Model | Made by |
|---|---|---|
| ChatGPT | GPT-4o, GPT-5, o3, o4-mini | OpenAI |
| Claude.ai | Claude Sonnet, Claude Opus, Claude Haiku | Anthropic |
| Gemini (gemini.google.com) | Gemini 2.5 Pro, Gemini Flash | |
| Meta AI (on WhatsApp, Instagram) | Llama 4 | Meta |
| Grok (on X/Twitter) | Grok 3, Grok 4 | xAI |
The same underlying model can power multiple products. And the product you use may offer different models on different pricing tiers — for example, ChatGPT Free uses a lighter model while ChatGPT Plus gives you access to more powerful ones.
The Big Four Labs
OpenAI
Made ChatGPT, the most widely used AI product. Model families: GPT (general purpose) and o-series (reasoning — see below). Known for broad capability, a huge developer ecosystem, and being the first to go mainstream.
Anthropic
Makes the Claude family. Strong on safety, long documents, and coding. The Claude name comes with tiers: Haiku (fast/cheap), Sonnet (balanced),Opus (most powerful). Popular with developers and professionals.
Google DeepMind
Makes the Gemini family. Stands out for multimodal ability (images, audio, video), very long context windows (up to 1 million tokens), and deep integration with Google Search and Workspace. Tiers: Flash (fast), Pro, Ultra.
Meta AI
Makes Llama — the world's most popular open-weight model family. Unlike the others, Meta releases Llama for free download. Anyone can run it locally, fine-tune it, or build products with it. Llama 4 matches frontier proprietary performance in many tasks.
Other notable players
Mistral AI (France) — efficient open-weight models, strong in coding and European languages. DeepSeek (China) — open-weight reasoning models that compete with frontier closed models at a fraction of the cost. Microsoft — Phi family of small language models (SLMs) designed to run on devices. xAI — Grok models, integrated with X/Twitter real-time data.
Frontier vs Open-Weight Models
This is the most important split in the model landscape right now:
Frontier (Proprietary) Models
- You access them via API or web interface
- The weights are never released — you can't download them
- Examples: GPT-5, Claude Opus, Gemini Ultra
- Best performance on hard tasks
- Your data is sent to the company's servers
- Pay per use (tokens) or subscription
Open-Weight Models
- You download the model and run it yourself
- No data leaves your computer or server
- Examples: Llama 4, Mistral, DeepSeek-R1, Phi-3
- Free to run (you pay for hardware/cloud compute)
- Can be customised and fine-tuned
- Usually slightly behind frontier models on hardest tasks
Note: "Open-weight" is not the same as "open-source." Open-weight means the trained model weights are released. Open-source would mean the training code and data are also released. Most open models (including Llama) only release the weights.
Model Tiers: Speed vs Capability
Every major AI lab offers multiple tiers of model — smaller/faster/cheaper and larger/smarter/more expensive. Knowing the tier naming conventions saves a lot of confusion:
| Tier | OpenAI | Anthropic (Claude) | Google (Gemini) | Best for |
|---|---|---|---|---|
| Fast / Cheap | GPT-4o mini, o4-mini | Claude Haiku | Gemini Flash | High-volume, simple tasks, cost-sensitive apps |
| Balanced | GPT-4o | Claude Sonnet | Gemini Pro | Most everyday tasks — best value per token |
| Most Capable | GPT-5, o3 | Claude Opus | Gemini Ultra | Hard problems, complex reasoning, highest quality |
For most tasks, the balanced tier is the right default. Use the fast tier for bulk/simple work, and the top tier only when the balanced tier fails.
Reasoning Models: A New Type
Since late 2024, a new category of model has emerged: reasoning models. Instead of immediately generating a response, they spend extra time "thinking" — producing a long internal chain-of-thought — before giving you an answer.
These are dramatically better at maths, logic puzzles, complex coding, and planning, but they're slower and more expensive. Key reasoning models:
- OpenAI o3 / o4-mini — OpenAI's reasoning models (separate from GPT)
- DeepSeek-R1 — Open-weight reasoning model matching frontier performance
- Claude with extended thinking — Anthropic's implementation
See the page Reasoning Models Explained in this section for the full picture.
Multimodal Models
Modern frontier models are multimodal — they can process more than text. Today's leading models understand:
- Images — analyse photos, diagrams, charts, screenshots
- Audio — transcribe speech, answer questions about audio recordings
- Documents/PDFs — read and reason over uploaded files
- Video — some models (Gemini) can process video frames
All of GPT-4o, Claude (Sonnet, Opus), and Gemini Pro/Ultra handle text and images. Audio and video support varies — check the provider's current docs.
Model Versions: Why the Name Keeps Changing
You'll see version numbers like GPT-4, GPT-4o, GPT-4o mini, GPT-5, o1, o3 — from just one company. Here's how to read them:
- GPT-4 → GPT-4o — The "o" stands for "omni" (multimodal). A major capability upgrade using the same generation number.
- GPT-5 — A generational jump. Substantially more capable than GPT-4.
- o1, o3, o4 — A separate reasoning-focused model series. The numbers don't follow normal versioning (there is no o2 — skipped to avoid confusion with a UK telecom brand).
- Claude 3 Sonnet → Claude 3.5 Sonnet → Claude 4.5 Sonnet — Minor and major version bumps. Always check the release date if you need the latest.
Practical tip
Model names get stale fast. When choosing a model, always check current benchmarks rather than relying on name recognition alone. A "new" model can be significantly better than a well-known older one.
A Simple Decision Guide
For everyday tasks, this covers most situations:
| Task | Good starting point |
|---|---|
| Writing, editing, summarising | Claude Sonnet or GPT-4o |
| Coding help, debugging | Claude Sonnet or GPT-4o — try o3 for hard bugs |
| Hard maths, logic, planning | o3, o4-mini, or DeepSeek-R1 |
| Long documents (100+ pages) | Gemini 2.5 Pro (1M token context) |
| Image analysis | GPT-4o, Claude Sonnet, Gemini Pro |
| Privacy-sensitive / offline | Llama 4 via Ollama (runs locally) |
| High-volume / cost-sensitive | Claude Haiku or Gemini Flash |
Where to Compare Models
- Chatbot Arena (lmarena.ai) — humans rate head-to-head model battles blind. The most human-feedback-based leaderboard. Great for general usefulness.
- Artificial Analysis (artificialanalysis.ai) — tracks speed, price, and quality across all major models. Best for choosing based on value per token.
- Hugging Face Open LLM Leaderboard — focuses on open-weight models and academic benchmarks. Good for comparing local/self-hosted options.
- Provider documentation — always check the latest model pages from OpenAI, Anthropic, and Google as benchmarks go out of date fast.
Checklist: Do You Understand This?
- Can you name one model and one product for each of the Big Four labs?
- What is the difference between a model and a product?
- What makes open-weight models different from frontier proprietary models?
- What does "reasoning model" mean in plain English?
- For a privacy-sensitive task, which type of model would you choose and why?
- Where would you go to compare current model performance?