Foundation Models

Model releases, benchmark results, pricing changes, and open-weight developments. Curated for builders who need to track what is available and what it costs.

Qwen3 — Alibaba's New Model Family with Thinking Mode

Alibaba releases Qwen3 family: 0.6B to 235B MoE. All models support a 'thinking mode' toggle (chain-of-thought on/off per request). Qwen3-Coder 480B MoE targets software engineering. Apache 2.0 licensed. Strong in 29 languages.

Why it matters: Thinking-mode toggle is a practical innovation — use fast mode for simple tasks, reasoning mode for complex ones, in the same model. Qwen3 Coder 480B directly challenges frontier coding models at open-weight prices. Extends Chinese AI labs' impact on the open-weight ecosystem.

Alibaba Cloudalibabaqwenopen-weightreasoningmultilingualmoe

Grok 4.3 — xAI's Updated Flagship with 1M Context

xAI releases Grok 4.3 — current flagship at $1.25/$2.50 per 1M tokens with a 1M token context window. Positioned between Claude Sonnet and Opus on price/quality. $150/month in free developer credits available via data-sharing program.

Why it matters: 1M context window at competitive pricing makes Grok 4.3 a viable choice for long-document tasks. The free $150/month credit program is the most generous developer offer from any major AI lab in 2026 — lowering the barrier to experiment with xAI's API significantly.

xAIxaigrokcontext-windowapireasoning

Llama 4 Scout & Maverick — Meta Ships Open-Weight Multimodal MoE

Meta releases Llama 4 Scout (109B MoE, 10M token context) and Maverick (400B MoE, 17B active parameters). Both are natively multimodal, Apache 2.0 licensed, and match or beat GPT-4o on major benchmarks at a fraction of the inference cost.

Why it matters: First open-weight models to seriously challenge frontier closed-source models on quality. The 10M context window on Scout is the largest of any openly available model. MoE architecture means inference cost scales with active parameters (17B), not total (400B). Shifts the open vs closed model debate significantly.

Meta AImetallamaopen-weightmultimodalmoecontext-window

DeepSeek Releases R2 — Open-Weight Reasoning Model

DeepSeek R2 achieves competitive reasoning performance with an open-weight license, making advanced reasoning accessible to self-hosted deployments.

Why it matters: Open-weight reasoning models reduce dependency on closed APIs for complex tasks. Important for enterprises with data residency requirements.

DeepSeek Blogdeepseekreasoningopen-weight

GPT-5 Launches — OpenAI Frontier Model with 400K Token Context

GPT-5 launches as OpenAI's new flagship with a 400K token context window, strong AIME 2025 maths performance, and significantly improved multi-step project execution and autonomous coding capability.

Why it matters: Sets a new capability baseline for closed frontier models. The 400K context window makes whole-codebase and large document reasoning practical via API. Forces pricing and capability recalibration across all competing providers.

OpenAI Blogopenaigpt-5frontierfoundation-models

Anthropic Releases Claude Opus 4 — Most Capable Model Yet

Claude Opus 4 sets new benchmarks across coding, reasoning, and extended thinking tasks, with improved tool use and agentic capabilities.

Why it matters: Represents a significant step in model capability for builders relying on agentic workflows and complex multi-step reasoning.

Anthropic Bloganthropicclaudefoundation-models

OpenAI Releases o3 and o4-mini — Reasoning Models with Native Tool Use

o3 and o4-mini combine chain-of-thought reasoning with native tool use, enabling models to search the web, run code, and call APIs mid-reasoning.

Why it matters: Reasoning + tool use in a single model removes the need to orchestrate separate search and reasoning steps, simplifying agentic pipeline design.

OpenAI Blogopenaireasoningtool-useo-series

Meta Releases Llama 4 — Natively Multimodal Open-Weight MoE Models

Meta releases Llama 4 Scout (17B active params, 10M token context, runs on a single H100) and Maverick (17B active/400B total, 1M context) — the first natively multimodal Llama models trained on text, images, and video data.

Why it matters: Llama 4 is the new open-weight baseline for self-hosted multimodal deployments. Enterprises with data residency requirements now have a competitive open alternative to closed frontier models at a fraction of the API cost.

Meta AI Blogmetallamaopen-weightmultimodalmoe

Google Gemini 2.5 Pro — 1M Token Context and Thinking Mode Released

Gemini 2.5 Pro adds a thinking mode (extended reasoning) alongside its 1M token context window, topping key benchmarks including coding and maths.

Why it matters: 1M context makes whole-codebase and whole-document analysis practical. Thinking mode brings reasoning capability to Google's ecosystem.

Google DeepMind Bloggooglegeminilong-contextreasoning

Page built: 01 Jun 2026