Intermediate

Open Source AI & Economics

The economics of AI are changing faster than most leaders realise. Inference costs have fallen by over 280× in 18 months. The number of open-source inference providers grew from 27 to 90 in a single year. Models that were state-of-the-art in 2023 are now available for $0.07 per million tokens. DeepSeek-R1 matched frontier closed-model performance at a fraction of the training cost. Understanding these economics is not optional for AI leaders — it determines which strategies are viable, what your cost structure will look like, and how long your vendor decisions will hold.

The Inference Cost Collapse

GPT-3 launched at $20 per million tokens in November 2022. By October 2024, comparable capability (Google Gemini 1.5 Flash 8B) was available at $0.07 per million tokens — a 280× cost reduction in under two years. This rate of decline is faster than Moore's Law and is driven by three compounding forces:

Hardware Efficiency

GPU performance per dollar improves ~30% annually. NVIDIA H100 → H200 → Blackwell each generation delivers more tokens per second at lower power. Custom silicon (Google TPU, Groq LPU, Cerebras) targets inference specifically, not training.

Algorithmic Efficiency

Quantisation (running models in INT4/INT8 instead of FP16), distillation (smaller models trained to match larger model outputs), speculative decoding, and flash attention have dramatically reduced compute per token. Energy efficiency improves ~40% annually from algorithms alone.

Market Competition

More inference providers competing for the same workloads. Open-source models served by 20+ providers face commodity pricing. Providers like Groq, Cerebras, Fireworks AI, and Together.ai compete aggressively on price and speed.

Strategic implication

Any business case built on current inference costs will look different in 18 months. Use cases that are currently uneconomical (very high volumes, marginal per-call ROI) become viable as costs fall. Build your AI strategy to capture value as costs decline, not just at today's prices.

The Open-Weight Model Movement

"Open-source AI" requires some precision. Truly open-source AI (open weights, open training data, open code) is rare. "Open-weight" models release the trained weights but not always the training data or reproducible training code. The practical difference: open-weight models can be downloaded, run, and fine-tuned by anyone.

Key open-weight model families (2025)

Meta Llama 4: Scout (17B MoE), Maverick (400B MoE); 10M context; multimodal. Earlier Llama 3.x family (8B, 70B, 405B) still widely deployed. Commercial use permitted under Meta's licence (with restrictions at large scale). The most widely deployed open-weight family.

Mistral: European open-weight leader. Mixture-of-Experts architecture (Mistral Large 3 — 675B total, 41B active; earlier Mixtral 8×7B, 8×22B families). Apache 2.0 licence — fully permissive.

DeepSeek: Chinese lab whose open-weight R1 reasoning model matched OpenAI o1 at ~5% of the reported training cost. Major market shock in January 2025. MIT licence.

Qwen (Alibaba): Strong multilingual models; Apache 2.0. Competitive at the 7B–72B range.

Why open-weight matters strategically

Open-weight models can be self-hosted, eliminating data egress and API dependency. They can be fine-tuned on proprietary data. Their weights can be inspected (for bias audits, safety checks). They are not subject to vendor price changes, API deprecation, or terms-of-service changes. The tradeoff: you own the operational complexity.

Commoditization and What It Means

Frontier AI capability is commoditising. GPT-4-level performance, once exclusive to OpenAI, is now available via Llama 3.1 405B, Mistral Large, Gemini Pro, and DeepSeek V3 — all open or cheaply available. This has several strategic consequences:

For Builders

Model quality is no longer a differentiator at the application layer. The advantage shifts to data, evaluation, integration quality, and user experience. A startup cannot build a better GPT-4 — but it can build a better product on top of commoditised models.

For Enterprise Buyers

Vendor negotiating power shifts toward buyers. If OpenAI, Anthropic, Google, and Mistral all offer comparable capability, switching costs fall and pricing competition intensifies. Multi-model strategies (routing different tasks to best-value providers) become more viable.

For Closed-Model Labs

Frontier advantage windows shrink. OpenAI's GPT-4 held a significant lead for ~12 months; GPT-4o's lead was shorter. Labs must continuously push the frontier to maintain premium pricing — the treadmill accelerates.

For Compliance-Sensitive Industries

Open-weight self-hosting becomes the path to regulatory compliance in healthcare, defence, and finance — where data cannot leave controlled environments. This is a structural advantage of open-weight, not just a cost advantage.

The Inference-as-a-Service Ecosystem

A competitive layer of "Inference-as-a-Service" (IaaS) providers now serves open-weight models at commodity prices with pay-as-you-go pricing. The number of providers grew from 27 to 90 between early and late 2025.

Major inference providers (2025)

Groq: LPU (Language Processing Unit) hardware; extreme speed (hundreds of tokens/second). Focused on latency-sensitive applications.
Cerebras: Wafer-scale chip; Llama 3.1 405B at 969 tokens/second — faster than any GPU cluster for this model size.
Together.ai: Broad open-weight model catalogue; fine-tuning service; competitive pricing.
Fireworks AI: Ultra-fast inference with enterprise SLAs; function calling and JSON mode support.
Perplexity, OpenRouter: Aggregators routing across multiple model providers, enabling multi-model workflows from a single API.

What It Means for Your AI Strategy

Avoid Single-Vendor Lock-in

Build your application layer on abstractions (LangChain, LiteLLM, custom routers) that allow model swapping. As open-weight models improve, routing cost-sensitive workloads to cheaper providers while keeping frontier models for quality-sensitive tasks is now the optimal architecture.

Monitor Cost Curves, Not Just Today's Prices

If a use case is borderline uneconomical today, check whether it becomes viable in 12 months at projected cost declines. Some high-volume use cases that were cost-prohibitive in 2023 are now standard practice in 2025.

Proprietary Data > Proprietary Models

As model capability commoditises, the durable competitive advantage is your data. Proprietary datasets enable fine-tuning to performance levels generic models cannot reach. Invest in data curation, labelling, and governance — these compound over time.

Evaluate Open-Weight for Each Use Case

The default assumption that proprietary APIs are better than open-weight is outdated for many tasks. Run evaluations. At 7B–70B scale, Llama 3.1 and Mistral are competitive for many standard tasks at dramatically lower cost when self-hosted.

See also: Deep Dive → The Open Source AI Movement — technical internals of open-weight model training, weight release decisions, and the compute economics behind open vs. closed.

Checklist: Do You Understand This?

What three forces are driving the inference cost collapse, and how fast is cost declining?
What is the difference between "open-source" and "open-weight" AI, and why does it matter practically?
Name four major open-weight model families and their licensing terms.
Why does AI commoditization shift competitive advantage from models to data and product?
What is Inference-as-a-Service, and which providers are leading this space?
How should the falling cost curve change how you evaluate borderline AI use cases today?