Open-Weight vs Proprietary Models
The open-weight vs proprietary choice is one of the most consequential architectural decisions when building AI systems. It affects cost, data privacy, latency, capability ceiling, compliance, and operational complexity. This page gives you a rigorous framework for making this decision.
Terminology: Open-Weight vs Open-Source
These terms are often confused:
- Open-source AI — Model weights, training code, and training data are all publicly released. Rare — almost no frontier-quality models are truly open-source.
- Open-weight — Model weights are publicly released and downloadable. Training code and/or data may not be. This describes most "open" AI models: LLaMA, Mistral, DeepSeek, Gemma, Phi, Qwen.
- Proprietary (closed) — Model weights are never released; access only via API or hosted product. GPT-5, Claude, Gemini Ultra.
Licences matter for open-weight models. "Free to download" does not mean "free to use commercially." LLaMA 4 has commercial restrictions above a usage threshold. Mistral models have varied licences. Always check the specific model's licence before production deployment.
The Core Trade-offs
Most mature teams use both — proprietary for quality-critical tasks, open-weight for cost-sensitive volume
Open-weight eliminates data residency risk; proprietary eliminates operational burden
| Dimension | Proprietary (Closed API) | Open-Weight (Self-Hosted) |
|---|---|---|
| Quality ceiling | Highest — GPT-5, Claude Opus, Gemini Ultra | Approaching but not yet at frontier (Llama 4: ~85-86% MMLU-Pro) |
| Cost model | Pay per token; no upfront cost; predictable at low volume | Infrastructure cost (GPU/cloud); no per-token cost; cheaper at scale |
| Data privacy | Data sent to provider's infrastructure | Data never leaves your infrastructure |
| Regulatory compliance | Depends on provider's certifications (SOC2, HIPAA, GDPR) | Full control; can deploy in air-gapped or regulated environments |
| Customisation | Limited (fine-tuning available on some, but you don't own the base) | Full fine-tuning, quantisation, LoRA; domain adaptation possible |
| Latency | Network round-trip; highly optimised servers; generally fast | Local inference: lower latency on fast hardware; varies by setup |
| Vendor lock-in | High — pricing, availability, and behaviour controlled by provider | None — you own the deployment |
| Ops burden | None — provider manages everything | High — GPU procurement, model serving, updates, monitoring |
| Model updates | Automatic; can break existing prompts unexpectedly | You control when (and whether) to upgrade |
Cost Comparison at Scale
The cost crossover point depends on your query volume and model tier:
| Scenario | Proprietary API | Self-hosted open-weight | Better choice |
|---|---|---|---|
| Prototype / low volume (<100K tokens/day) | ~$0–5/day | GPU amortised: $5–30+/day | API |
| Medium volume (1M tokens/day) | $25–250/day (tier dependent) | $10–50/day (1–2 GPUs) | Depends on tier |
| High volume (100M+ tokens/day) | $2,500–25,000+/day | $100–1,000/day (optimised cluster) | Self-hosted strongly favoured |
The self-hosted break-even typically falls somewhere between 10M–100M tokens/day depending on model size, GPU type, and whether you're on cloud or bare metal.
Data Privacy and Residency
With proprietary APIs, every prompt and completion transits the provider's infrastructure. Even with enterprise agreements (OpenAI Enterprise, Anthropic for Business, Google Vertex AI), key considerations are:
- Training opt-out — Most enterprise tiers promise not to use your data for model training, but verify this in your contract
- Log retention — Provider may retain logs for 30–90 days for abuse detection; understand what this means for sensitive data
- Jurisdiction — Where are servers located? EU GDPR may require data to stay in the EU; some providers offer EU-hosted endpoints
- HIPAA / SOC2 coverage — Check which tiers include the required compliance certifications for your industry
Self-hosted open-weight models eliminate these concerns entirely — inference runs in your own infrastructure; no data transmitted externally.
The Quality Gap in 2025–2026
For most tasks, the quality gap between frontier proprietary and top open-weight models has narrowed substantially:
- Coding: Llama 3.1 70B and DeepSeek-V3 are competitive with GPT-4o on many coding benchmarks
- Reasoning: DeepSeek-R1 matches o1 on AIME and MATH
- General tasks: Llama 4 Maverick achieves ~85% MMLU-Pro vs GPT-5's ~90%+
However, proprietary frontier models still lead on:
- The absolute hardest tasks (PhD-level GPQA, frontier reasoning)
- Multimodal tasks (especially video and audio)
- Instruction-following consistency and safety alignment
- Very long-context tasks (1M tokens is rare in open-weight)
The Hybrid Strategy
Most mature AI teams use both — not one or the other:
Typical hybrid approach
- Prototype with proprietary API — zero infrastructure, fastest time to value, best quality ceiling
- Migrate cost-sensitive workloads to open-weight — classification, routing, simple generation, high-volume processing
- Keep proprietary for hard tasks — complex reasoning, frontier quality requirements, multimodal
- Route at inference time — classifier sends simple queries to local 7B model, complex queries to GPT-5 or Claude Opus
Open-Weight Ecosystem
Running open-weight models is significantly more accessible than it was even 2 years ago:
- Ollama — One-command local model running; supports Llama, Mistral, DeepSeek, Phi, Qwen, and 100+ others; OpenAI-compatible API
- Hugging Face — Central repository for model weights; 700K+ models; Transformers library for Python integration
- vLLM — Production inference server; continuous batching; high throughput for multi-user serving
- llama.cpp — CPU-based inference; runs quantised models on laptops and edge devices
- LM Studio — Desktop GUI for local model management; good for non-technical users
Checklist: Do You Understand This?
- What is the difference between "open-weight" and "open-source"?
- At what approximate token volume does self-hosting become cost-competitive?
- Name three data privacy advantages of self-hosted open-weight models.
- On what types of tasks does the quality gap between proprietary and open-weight models remain largest?
- Describe a hybrid routing strategy and when it makes sense.
- Why should you check the licence of an open-weight model before production use?