Intermediate

Open-Weight vs Proprietary Models

The open-weight vs proprietary choice is one of the most consequential architectural decisions when building AI systems. It affects cost, data privacy, latency, capability ceiling, compliance, and operational complexity. This page gives you a rigorous framework for making this decision.

Terminology: Open-Weight vs Open-Source

These terms are often confused:

Open-source AI — Model weights, training code, and training data are all publicly released. Rare — almost no frontier-quality models are truly open-source.
Open-weight — Model weights are publicly released and downloadable. Training code and/or data may not be. This describes most "open" AI models: LLaMA, Mistral, DeepSeek, Gemma, Phi, Qwen.
Proprietary (closed) — Model weights are never released; access only via API or hosted product. GPT-5, Claude, Gemini Ultra.

Licences matter for open-weight models. "Free to download" does not mean "free to use commercially." LLaMA 4 has commercial restrictions above a usage threshold. Mistral models have varied licences. Always check the specific model's licence before production deployment.

The Core Trade-offs

Proprietary (Closed API)

Best quality, zero ops, data leaves your infra

Open-Weight (Self-Hosted)

Full control, data stays local, ops burden

GPT-5 / Claude Opus

Mid-tier APIs

Managed open-weight (Groq)

Self-hosted (vLLM)

Most mature teams use both — proprietary for quality-critical tasks, open-weight for cost-sensitive volume

Proprietary — key attributes

Frontier quality

GPT-5, Claude Opus lead

Pay-per-token

No infra cost upfront

Data exits org

Provider's infrastructure

Zero ops

Provider manages everything

Open-Weight — key attributes

Nearly frontier

DeepSeek-R1 ≈ o1; Llama 4 ≈ 85% MMLU

Fixed infra cost

Cheaper at >10M tokens/day

Data stays local

No external transmission

High ops burden

GPU, serving, updates, monitoring

Open-weight eliminates data residency risk; proprietary eliminates operational burden

Dimension	Proprietary (Closed API)	Open-Weight (Self-Hosted)
Quality ceiling	Highest — GPT-5, Claude Opus, Gemini Ultra	Approaching but not yet at frontier (Llama 4: ~85-86% MMLU-Pro)
Cost model	Pay per token; no upfront cost; predictable at low volume	Infrastructure cost (GPU/cloud); no per-token cost; cheaper at scale
Data privacy	Data sent to provider's infrastructure	Data never leaves your infrastructure
Regulatory compliance	Depends on provider's certifications (SOC2, HIPAA, GDPR)	Full control; can deploy in air-gapped or regulated environments
Customisation	Limited (fine-tuning available on some, but you don't own the base)	Full fine-tuning, quantisation, LoRA; domain adaptation possible
Latency	Network round-trip; highly optimised servers; generally fast	Local inference: lower latency on fast hardware; varies by setup
Vendor lock-in	High — pricing, availability, and behaviour controlled by provider	None — you own the deployment
Ops burden	None — provider manages everything	High — GPU procurement, model serving, updates, monitoring
Model updates	Automatic; can break existing prompts unexpectedly	You control when (and whether) to upgrade

Cost Comparison at Scale

The cost crossover point depends on your query volume and model tier:

Scenario	Proprietary API	Self-hosted open-weight	Better choice
Prototype / low volume (<100K tokens/day)	~$0–5/day	GPU amortised: $5–30+/day	API
Medium volume (1M tokens/day)	$25–250/day (tier dependent)	$10–50/day (1–2 GPUs)	Depends on tier
High volume (100M+ tokens/day)	$2,500–25,000+/day	$100–1,000/day (optimised cluster)	Self-hosted strongly favoured

The self-hosted break-even typically falls somewhere between 10M–100M tokens/day depending on model size, GPU type, and whether you're on cloud or bare metal.

Data Privacy and Residency

With proprietary APIs, every prompt and completion transits the provider's infrastructure. Even with enterprise agreements (OpenAI Enterprise, Anthropic for Business, Google Vertex AI), key considerations are:

Training opt-out — Most enterprise tiers promise not to use your data for model training, but verify this in your contract
Log retention — Provider may retain logs for 30–90 days for abuse detection; understand what this means for sensitive data
Jurisdiction — Where are servers located? EU GDPR may require data to stay in the EU; some providers offer EU-hosted endpoints
HIPAA / SOC2 coverage — Check which tiers include the required compliance certifications for your industry

Self-hosted open-weight models eliminate these concerns entirely — inference runs in your own infrastructure; no data transmitted externally.

The Quality Gap in 2025–2026

For most tasks, the quality gap between frontier proprietary and top open-weight models has narrowed substantially:

Coding: Llama 3.1 70B and DeepSeek-V3 are competitive with GPT-4o on many coding benchmarks
Reasoning: DeepSeek-R1 matches o1 on AIME and MATH
General tasks: Llama 4 Maverick achieves ~85% MMLU-Pro vs GPT-5's ~90%+

However, proprietary frontier models still lead on:

The absolute hardest tasks (PhD-level GPQA, frontier reasoning)
Multimodal tasks (especially video and audio)
Instruction-following consistency and safety alignment
Very long-context tasks (1M tokens is rare in open-weight)

The Hybrid Strategy

Most mature AI teams use both — not one or the other:

Typical hybrid approach

Prototype with proprietary API — zero infrastructure, fastest time to value, best quality ceiling
Migrate cost-sensitive workloads to open-weight — classification, routing, simple generation, high-volume processing
Keep proprietary for hard tasks — complex reasoning, frontier quality requirements, multimodal
Route at inference time — classifier sends simple queries to local 7B model, complex queries to GPT-5 or Claude Opus

Open-Weight Ecosystem

Running open-weight models is significantly more accessible than it was even 2 years ago:

Ollama — One-command local model running; supports Llama, Mistral, DeepSeek, Phi, Qwen, and 100+ others; OpenAI-compatible API
Hugging Face — Central repository for model weights; 700K+ models; Transformers library for Python integration
vLLM — Production inference server; continuous batching; high throughput for multi-user serving
llama.cpp — CPU-based inference; runs quantised models on laptops and edge devices
LM Studio — Desktop GUI for local model management; good for non-technical users

Checklist: Do You Understand This?

What is the difference between "open-weight" and "open-source"?
At what approximate token volume does self-hosting become cost-competitive?
Name three data privacy advantages of self-hosted open-weight models.
On what types of tasks does the quality gap between proprietary and open-weight models remain largest?
Describe a hybrid routing strategy and when it makes sense.
Why should you check the licence of an open-weight model before production use?