Foundation Models
Model releases, benchmark results, pricing changes, and open-weight developments. Curated for builders who need to track what is available and what it costs.
DeepSeek Releases R2 — Open-Weight Reasoning Model
DeepSeek R2 achieves competitive reasoning performance with an open-weight license, making advanced reasoning accessible to self-hosted deployments.
Why it matters: Open-weight reasoning models reduce dependency on closed APIs for complex tasks. Important for enterprises with data residency requirements.
GPT-5 Launches — OpenAI Frontier Model with 400K Token Context
GPT-5 launches as OpenAI's new flagship with a 400K token context window, strong AIME 2025 maths performance, and significantly improved multi-step project execution and autonomous coding capability.
Why it matters: Sets a new capability baseline for closed frontier models. The 400K context window makes whole-codebase and large document reasoning practical via API. Forces pricing and capability recalibration across all competing providers.
Anthropic Releases Claude Opus 4 — Most Capable Model Yet
Claude Opus 4 sets new benchmarks across coding, reasoning, and extended thinking tasks, with improved tool use and agentic capabilities.
Why it matters: Represents a significant step in model capability for builders relying on agentic workflows and complex multi-step reasoning.
OpenAI Releases o3 and o4-mini — Reasoning Models with Native Tool Use
o3 and o4-mini combine chain-of-thought reasoning with native tool use, enabling models to search the web, run code, and call APIs mid-reasoning.
Why it matters: Reasoning + tool use in a single model removes the need to orchestrate separate search and reasoning steps, simplifying agentic pipeline design.
Meta Releases Llama 4 — Natively Multimodal Open-Weight MoE Models
Meta releases Llama 4 Scout (17B active params, 10M token context, runs on a single H100) and Maverick (17B active/400B total, 1M context) — the first natively multimodal Llama models trained on text, images, and video data.
Why it matters: Llama 4 is the new open-weight baseline for self-hosted multimodal deployments. Enterprises with data residency requirements now have a competitive open alternative to closed frontier models at a fraction of the API cost.
Google Gemini 2.5 Pro — 1M Token Context and Thinking Mode Released
Gemini 2.5 Pro adds a thinking mode (extended reasoning) alongside its 1M token context window, topping key benchmarks including coding and maths.
Why it matters: 1M context makes whole-codebase and whole-document analysis practical. Thinking mode brings reasoning capability to Google's ecosystem.