The Major Model Families
The AI model landscape in 2025–2026 is dominated by a handful of major families, each with distinct strengths, licensing models, and use-case sweet spots. This page covers each family in enough depth to make informed selection decisions.
Closed API = pay-per-token, zero ops. Open-weight = fixed infra cost, full control. Most production systems use both.
OpenAI — GPT and o-Series
OpenAI maintains two parallel model lines for different needs:
GPT Series
| Model | Context | Strengths | Best for |
|---|---|---|---|
| GPT-4o | 128K | Multimodal, fast, broad capability | Default workhorse for most tasks |
| GPT-4o mini | 128K | Fast, very cheap | High-volume simple tasks |
| GPT-5 | 400K | Highest general capability, reduced hallucination | Professional knowledge work, hardest general tasks |
o-Series (Reasoning)
OpenAI's separate reasoning-focused family. Spends additional compute "thinking" before answering. Dramatically better at maths, formal logic, and complex code. See the Reasoning Models section for full detail.
- o3 — Full reasoning with native tool use; best quality on hardest problems
- o4-mini — Cost-efficient reasoning; on benchmarks often matches o3 at 1/9th the cost
OpenAI's core strength: Largest developer ecosystem, broadest tool integration (Assistants API, function calling, fine-tuning), most mature production infrastructure.
Anthropic — Claude Family
Anthropic's Claude models are known for instruction-following quality, long-context accuracy, coding reliability, and safety-focused behaviour. The naming convention:Haiku (fast/cheap) → Sonnet (balanced) →Opus (most capable).
| Model | Context | Strengths |
|---|---|---|
| Claude Haiku 4.5 | 200K | Fastest Claude, very cheap, good for routing and classification |
| Claude Sonnet 4.5 | 200K | Best all-round value: coding, analysis, writing, agentic tasks |
| Claude Opus 4.6 | 200K | Most capable Claude; extended thinking mode; research-grade tasks |
Claude's core strengths: Extremely long and accurate context handling (200K native), strong coding reliability (consistently top-rated on SWE-bench), "computer use" for browser/desktop automation, and strong agentic tool-calling behaviour. Claude Code (Anthropic's agentic coding tool) is built on Claude Sonnet/Opus.
Google — Gemini Family
Google's Gemini family is defined by its multimodal capability and extreme context lengths.
| Model | Context | Strengths |
|---|---|---|
| Gemini Flash 2.5 | 1M | Very fast, 1M context, multimodal, cheap |
| Gemini Pro 2.5 | 1M | Strong coding, reasoning, multimodal; 1M context |
| Gemini Ultra / 3.x | 1M+ | Frontier capability, visual and audio reasoning |
Gemini's core strengths: The 1M token context window is the largest in production; invaluable for whole-codebase analysis, long book summarisation, or processing hundreds of documents at once. Strong multimodal — handles images, audio, and (for some models) video natively. Deep Google Workspace integration.
Meta — LLaMA Family (Open-Weight)
LLaMA is the world's most widely used open-weight model family. Meta releases model weights freely for research and commercial use (check specific version licences).
| Model | Parameters | Key features |
|---|---|---|
| Llama 3.1 8B | 8B | Runs on consumer GPU; good general reasoning |
| Llama 3.1 70B | 70B | Near-frontier quality; runs on 2× consumer GPUs |
| Llama 3.1 405B | 405B | Best open-weight general model; requires data centre GPU |
| Llama 4 Scout/Maverick | MoE ~400B total | Multimodal (vision), up to 10M token context, MoE efficiency |
LLaMA's core strength: No per-token cost, full data control, fine-tuning freedom. The massive open-source ecosystem (Ollama, llama.cpp, vLLM) makes deployment straightforward. Llama 4 achieves 85–86% MMLU-Pro — matching or approaching proprietary frontier models on many benchmarks.
Mistral — Compact European Models
Mistral AI (Paris) makes high-efficiency open-weight models with a strong commercial presence in Europe:
- Mistral 7B / Nemo (12B) — Punches above its weight for size; particularly strong on coding and instruction-following
- Mixtral 8x7B / 8x22B — Mixture-of-Experts models; only a subset of experts active per token, making them cost-efficient
- Mistral Large — Closed API model; competitive with GPT-4 tier on European-language tasks
Mistral's niche: European regulatory comfort (French company, EU-hosted options), strong multilingual European languages, and the most efficient open models for their quality tier.
DeepSeek — Chinese Open-Weight Leader
DeepSeek has produced the most impactful open-weight releases of 2024–2025:
- DeepSeek-V3 / V3.2 — Non-reasoning model; 671B MoE, 37B active; strong on coding and general tasks; very cheap API ($0.27/1M input tokens for V3)
- DeepSeek-R1 — Open-weight reasoning model that matches o1; full reasoning trace visible; distilled versions available for local deployment
DeepSeek's impact: R1's January 2025 release caused a "global cost reset" — demonstrating frontier reasoning capability is achievable without massive closed-source infrastructure. However: data privacy and safety alignment considerations apply (Chinese training and governance).
Microsoft — Phi (Small Language Models)
Microsoft's Phi family focuses on efficiency at small scale:
- Phi-3 Mini (3.8B) — Runs on phones; strong reasoning per parameter
- Phi-3 Small (7B) / Medium (14B) — Strong coding; can run on laptop GPU
- Phi-4 — Improved quality; strong on STEM tasks for its size
Phi's niche: Edge deployment (Android, iOS, embedded), offline applications, environments where even a 7B model is too large.
Alibaba — Qwen (Multilingual Open Models)
Qwen (from Alibaba Cloud) is the leading open-weight model family for multilingual tasks, especially Chinese and Asian languages:
- Qwen2.5 7B / 14B / 32B / 72B — Strong instruction-following, coding, and math at each size tier
- Qwen2.5-Coder — Code-specialised variant; competitive with DeepSeek-Coder
- QwQ-32B — Reasoning-capable model comparable to some o1 tasks
Qwen's niche: Applications serving Chinese or East Asian markets; base models for DeepSeek-R1 distillation (R1's distilled models use Qwen2.5 as the base architecture).
Reading Model Naming Conventions
Common patterns you'll encounter across families:
- Size suffix (7B, 70B, 405B) — Billions of parameters. Larger = more capable but more compute to run
- Instruct / Chat / Base — "Instruct" or "Chat" = fine-tuned to follow instructions. "Base" = raw pre-trained weights (not for end users)
- Q4, Q8 (quantisation) — Weight precision reduced to save memory. Q4 = 4-bit; sacrifices some quality for much smaller file size
- GGUF — File format for local inference via llama.cpp/Ollama
- MoE (Mixture of Experts) — Total params / Active params notation (e.g., 8x7B = 8 experts of 7B each; only 2 active per token = 14B active)
Checklist: Do You Understand This?
- What is OpenAI's o-series and how does it differ from the GPT series?
- What is Claude's main differentiator versus GPT-4o for production use cases?
- When would you choose Gemini over Claude or GPT-4o?
- What does "open-weight" mean in the context of Llama 4 or Qwen?
- Why did DeepSeek-R1's release matter beyond just being another model?
- What does "Q4" mean in a model file name?