Beginner

The AI Model Landscape

There are hundreds of AI models and new ones launch every month. This page cuts through the noise and gives you a mental map: who the major players are, what the confusing names mean, and how to pick a model for everyday use.

Model vs Product — Know the Difference

The most common confusion for newcomers: mixing up the model (the AI engine) with the product (the thing you log into).

Product (what you use)	Underlying Model	Made by
ChatGPT	GPT-5 (Instant, Thinking, Pro)	OpenAI
Claude.ai	Claude Sonnet, Claude Opus, Claude Haiku	Anthropic
Gemini (gemini.google.com)	Gemini 2.5 Pro, Gemini Flash	Google
Meta AI (on WhatsApp, Instagram)	Llama 4	Meta
Grok (on X/Twitter)	Grok 3, Grok 4	xAI

The same underlying model can power multiple products. And the product you use may offer different models on different pricing tiers — for example, ChatGPT Free uses a lighter model while ChatGPT Plus gives you access to more powerful ones.

Open-Weight

Download & run yourself — Llama, Mistral, Qwen

Closed / API-Only

Pay per use — GPT-4o, Claude, Gemini

Llama 3 (Meta)

Mistral / Qwen

DeepSeek

Gemini Flash

GPT-4o / Claude 3.5

The two main model categories — open-weight models you can run locally; closed models you access via APIs with per-use pricing

The Big Four Labs

OpenAI

Made ChatGPT, the most widely used AI product. Model families: GPT (general purpose) and o-series (reasoning — see below). Known for broad capability, a huge developer ecosystem, and being the first to go mainstream.

Anthropic

Makes the Claude family. Strong on safety, long documents, and coding. The Claude name comes with tiers: Haiku (fast/cheap), Sonnet (balanced),Opus (most powerful). Popular with developers and professionals.

Google DeepMind

Makes the Gemini family. Stands out for multimodal ability (images, audio, video), very long context windows (up to 1 million tokens), and deep integration with Google Search and Workspace. Tiers: Flash (fast), Pro, Ultra.

Meta AI

Makes Llama — the world's most popular open-weight model family. Unlike the others, Meta releases Llama for free download. Anyone can run it locally, fine-tune it, or build products with it. Llama 4 matches frontier proprietary performance in many tasks.

Other notable players

Mistral AI (France) — efficient open-weight models, strong in coding and European languages. DeepSeek (China) — open-weight reasoning models that compete with frontier closed models at a fraction of the cost. Microsoft — Phi family of small language models (SLMs) designed to run on devices. xAI — Grok models, integrated with X/Twitter real-time data.

Frontier vs Open-Weight Models

This is the most important split in the model landscape right now:

Frontier (Proprietary) Models

You access them via API or web interface
The weights are never released — you can't download them
Examples: GPT-5, Claude Opus, Gemini Ultra
Best performance on hard tasks
Your data is sent to the company's servers
Pay per use (tokens) or subscription

Open-Weight Models

You download the model and run it yourself
No data leaves your computer or server
Examples: Llama 4, Mistral, DeepSeek-R1, Phi-3
Free to run (you pay for hardware/cloud compute)
Can be customised and fine-tuned
Usually slightly behind frontier models on hardest tasks

Note: "Open-weight" is not the same as "open-source." Open-weight means the trained model weights are released. Open-source would mean the training code and data are also released. Most open models (including Llama) only release the weights.

Model Tiers: Speed vs Capability

Every major AI lab offers multiple tiers of model — smaller/faster/cheaper and larger/smarter/more expensive. Knowing the tier naming conventions saves a lot of confusion:

Tier	OpenAI	Anthropic (Claude)	Google (Gemini)	Best for
Fast / Cheap	GPT-4o mini, o4-mini	Claude Haiku	Gemini Flash	High-volume, simple tasks, cost-sensitive apps
Balanced	GPT-4o	Claude Sonnet	Gemini Pro	Most everyday tasks — best value per token
Most Capable	GPT-5, o3	Claude Opus	Gemini Ultra	Hard problems, complex reasoning, highest quality

For most tasks, the balanced tier is the right default. Use the fast tier for bulk/simple work, and the top tier only when the balanced tier fails.

Reasoning Models: A New Type

Since late 2024, a new category of model has emerged: reasoning models. Instead of immediately generating a response, they spend extra time "thinking" — producing a long internal chain-of-thought — before giving you an answer.

These are dramatically better at maths, logic puzzles, complex coding, and planning, but they're slower and more expensive. Key reasoning models:

OpenAI o3 / o4-mini — OpenAI's reasoning models (separate from GPT)
DeepSeek-R1 — Open-weight reasoning model matching frontier performance
Claude with extended thinking — Anthropic's implementation

See the page Reasoning Models Explained in this section for the full picture.

Multimodal Models

Modern frontier models are multimodal — they can process more than text. Today's leading models understand:

Images — analyse photos, diagrams, charts, screenshots
Audio — transcribe speech, answer questions about audio recordings
Documents/PDFs — read and reason over uploaded files
Video — some models (Gemini) can process video frames

All of GPT-4o, Claude (Sonnet, Opus), and Gemini Pro/Ultra handle text and images. Audio and video support varies — check the provider's current docs.

Model Versions: Why the Name Keeps Changing

You'll see version numbers like GPT-4, GPT-4o, GPT-4o mini, GPT-5, o1, o3 — from just one company. Here's how to read them:

GPT-4 → GPT-4o — The "o" stands for "omni" (multimodal). A major capability upgrade using the same generation number.
GPT-5 — A generational jump. Substantially more capable than GPT-4.
o1, o3, o4 — A separate reasoning-focused model series. The numbers don't follow normal versioning (there is no o2 — skipped to avoid confusion with a UK telecom brand).
Claude 3 Sonnet → Claude 3.5 Sonnet → Claude 4.5 Sonnet — Minor and major version bumps. Always check the release date if you need the latest.

Practical tip

Model names get stale fast. When choosing a model, always check current benchmarks rather than relying on name recognition alone. A "new" model can be significantly better than a well-known older one.

A Simple Decision Guide

For everyday tasks, this covers most situations:

Task	Good starting point
Writing, editing, summarising	Claude Sonnet or GPT-4o
Coding help, debugging	Claude Sonnet or GPT-4o — try o3 for hard bugs
Hard maths, logic, planning	o3, o4-mini, or DeepSeek-R1
Long documents (100+ pages)	Gemini 2.5 Pro (1M token context)
Image analysis	GPT-4o, Claude Sonnet, Gemini Pro
Privacy-sensitive / offline	Llama 4 via Ollama (runs locally)
High-volume / cost-sensitive	Claude Haiku or Gemini Flash

Where to Compare Models

Chatbot Arena (lmarena.ai) — humans rate head-to-head model battles blind. The most human-feedback-based leaderboard. Great for general usefulness.
Artificial Analysis (artificialanalysis.ai) — tracks speed, price, and quality across all major models. Best for choosing based on value per token.
Hugging Face Open LLM Leaderboard — focuses on open-weight models and academic benchmarks. Good for comparing local/self-hosted options.
Provider documentation — always check the latest model pages from OpenAI, Anthropic, and Google as benchmarks go out of date fast.

Checklist: Do You Understand This?

Can you name one model and one product for each of the Big Four labs?
What is the difference between a model and a product?
What makes open-weight models different from frontier proprietary models?
What does "reasoning model" mean in plain English?
For a privacy-sensitive task, which type of model would you choose and why?
Where would you go to compare current model performance?