Beginner

What's New in AI

A curated timeline of the milestones that actually changed how AI is built and used — from early 2024 through mid-2026. Not every model release; the ones that shifted the industry or changed what practitioners could do.

Last updated: May 2026

Model ReleaseProtocol / StandardTool / PlatformIndustry Shift

2026 (so far)

May 2026Tool / Platform

Devstral Small — agentic coding at $0.07/1M input

Mistral releases Devstral Small — an agentic coding specialist at $0.07/$0.28 per 1M tokens. Outperforms Codestral Small on SWE-bench agentic tasks, making economically viable coding agents accessible for high-volume loops.

May 2026Model Release

Qwen3 — thinking-mode toggle across open-weight models

Alibaba releases Qwen3 family (0.6B–235B MoE, Apache 2.0). All models support a thinking/non-thinking mode toggle per request. Qwen3 Coder 480B MoE targets software engineering. Strong multilingual support (29 languages). Extends Chinese AI labs' influence on open-weight ecosystem.

Apr 2026Model Release

GPT-5.5 and DeepSeek V4 Preview

GPT-5.5 (April 23) and DeepSeek V4-Pro (April 24, 1.6T total params, MIT license) arrived within 24 hours of each other — emblematic of 2026's pace. V4-Pro's permissive licensing continued the open-weight momentum DeepSeek started in January 2025.

Apr 2026Model Release

Claude 4 family: Haiku 4.5, Sonnet 4.6, Opus 4.7

Anthropic's fourth generation. Opus 4.7 launched April 16 — the most capable Claude to date. Sonnet 4.6 became the go-to production model: mid-tier cost, frontier-class reasoning for most tasks. The 4.x family raised the bar on agentic and multi-step tasks.

Apr 2026Tool / Platform

Graphify — codebase knowledge graphs go open-source

An MIT-licensed tool that turns any codebase into a queryable knowledge graph using local Tree-sitter parsing (no source code sent to external servers). 71.5× token reduction on real codebases. Crossed 22,000 GitHub stars in under ten days.

Apr 2026Tool / Platform

Claude Code Routines — scheduled AI automation

Anthropic added three trigger types to Claude Code: scheduled (cron-like), API-triggered, and GitHub event-triggered. AI-assisted automation moved from manual prompting to persistent background workflows.

Apr 2026Model Release

Grok 4.3 — xAI flagship with 1M context window

xAI releases Grok 4.3 at $1.25/$2.50/1M tokens with a 1M token context window — one of the largest of any commercial model. $150/month free developer credits available. Live X/Twitter search integration remains Grok's unique differentiator.

Apr 2026Model Release

Llama 4 Scout & Maverick — open-weight multimodal MoE

Meta releases Llama 4 Scout (109B MoE, 10M context) and Maverick (400B MoE, 17B active). Apache 2.0 licensed, natively multimodal, matching GPT-4o on major benchmarks at a fraction of inference cost. Scout fits a single H100. The open vs closed gap narrowed significantly.

Feb 2026Model Release

Gemini 3.1 Pro leads scientific reasoning

94.3% on GPQA Diamond — a PhD-level science benchmark. 77.1% on ARC-AGI-2. Google's Gemini family regained benchmark leadership in the first quarter, intensifying the rotation between labs at the top of the leaderboards.

2025

Aug 2025Model Release

GPT-5 integrated into ChatGPT

OpenAI's most capable model yet, with major improvements in multi-step reasoning, scientific problem-solving, and multi-modal understanding. Shipped directly into ChatGPT, making frontier capability accessible to all subscribers.

Jun 2025Tool / Platform

LazyGraphRAG — knowledge graphs become affordable

Microsoft's LazyGraphRAG reduced knowledge graph indexing cost from $20–500 to under $5 by deferring community summaries to query time. Graph RAG moved from a research technique to a practical production tool.

Apr 2025Model Release

GPT-4.1 family — OpenAI's mid-range tier

GPT-4.1, Mini, and Nano gave developers a coherent OpenAI lineup matching the tiered structure Anthropic popularised. The cost gap between tiers widened: GPT-4.1 Nano at sub-$0.10/1M tokens vs GPT-4.1 at $2.00/1M.

Apr 2025Model Release

Meta Llama 4 — natively multimodal MoE

Llama 4 Scout and Maverick used Mixture-of-Experts architecture (activating only a fraction of parameters per token) to deliver competitive performance at dramatically lower inference cost. Maverick beat GPT-4o on multiple benchmarks. Behemoth (2T total params) signalled the raw scale Meta was willing to deploy.

Mar 2025Model Release

Gemini 2.5 Pro leads benchmarks

Google's Gemini 2.5 Pro with 'thinking budget' took top positions on coding and reasoning leaderboards. The 'thinking budget' feature let developers trade cost against reasoning depth — a new knob for production optimization.

Feb 2025Model Release

Claude 3.7 Sonnet — hybrid reasoning

Anthropic's first model with an explicit 'extended thinking' mode — letting users toggle the reasoning depth. Topped SWE-bench for autonomous software engineering tasks. The practitioner shift toward reasoning models accelerated.

Jan 2025Industry Shift

DeepSeek R1 — the 'DeepSeek shock'

A Chinese lab released an open-weight reasoning model trained for approximately $6M — matching OpenAI o1 on coding and maths. It became the #1 free app on the US iOS App Store within days. The narrative that frontier AI required billions in training compute collapsed overnight.

2024

Dec 2024Model Release

OpenAI o3, Gemini 2.0, DeepSeek V3

A landmark month: o3 achieved near-human scores on ARC-AGI; Gemini 2.0 Flash matched o1 performance at lower cost; DeepSeek V3 quietly shipped as a top-tier open-weight model. The frontier moved dramatically in 30 days.

Nov 2024Protocol / Standard

Anthropic open-sources MCP

The Model Context Protocol defined how AI assistants connect to external tools and data sources — files, databases, APIs, services. Within weeks, hundreds of MCP servers appeared. MCP became the de facto standard for AI tool integration in 2025.

Oct 2024Model Release

Llama 3.2 adds vision and mobile

Meta added visual understanding to Llama and released models small enough to run on smartphones. On-device AI became practical for the first time on consumer hardware.

Sep 2024Model Release

OpenAI o1 — reasoning models arrive

o1 allocated compute to 'thinking before answering' — generating an internal chain of reasoning invisible to users. It scored above PhD level on maths and science benchmarks. Test-time compute emerged as a new design dimension.

Jun 2024Model Release

Claude 3.5 Sonnet surpasses Opus

A mid-tier model beating the flagship at lower cost. This broke the assumption that 'best quality = most expensive.' Cost-performance optimisation became a primary engineering concern from this point forward.

May 2024Model Release

GPT-4o released

Faster, cheaper, and natively multimodal — audio, text, and images in a single model. Real-time voice mode with natural interruptions previewed a new tier of human-AI interaction. Speed and cost dropped 2× vs GPT-4 Turbo.

Apr 2024Model Release

Meta releases Llama 3

Llama 3 70B matched proprietary models on most benchmarks. The era of 'open-weight models can compete with closed models' began in earnest. Developers gained a credible alternative to API-only providers.

Mar 2024Model Release

Anthropic launches Claude 3 (Haiku, Sonnet, Opus)

The first coherent model family with clear capability/cost/speed tiers. Opus topped GPT-4 on multiple benchmarks. The tiered release became the template every lab copied in 2024–2025.

Feb 2024Model Release

OpenAI debuts Sora

Sora produced photorealistic video from text prompts at a quality that shocked the industry. It signalled that generative AI's next frontier was temporal media, not just static images.

Page built: 01 Jun 2026