🧠 All Things AI
Intermediate

Open-Weight vs Proprietary Models

The open-weight vs proprietary choice is one of the most consequential architectural decisions when building AI systems. It affects cost, data privacy, latency, capability ceiling, compliance, and operational complexity. This page gives you a rigorous framework for making this decision.

Terminology: Open-Weight vs Open-Source

These terms are often confused:

  • Open-source AI — Model weights, training code, and training data are all publicly released. Rare — almost no frontier-quality models are truly open-source.
  • Open-weight — Model weights are publicly released and downloadable. Training code and/or data may not be. This describes most "open" AI models: LLaMA, Mistral, DeepSeek, Gemma, Phi, Qwen.
  • Proprietary (closed) — Model weights are never released; access only via API or hosted product. GPT-5, Claude, Gemini Ultra.

Licences matter for open-weight models. "Free to download" does not mean "free to use commercially." LLaMA 4 has commercial restrictions above a usage threshold. Mistral models have varied licences. Always check the specific model's licence before production deployment.

The Core Trade-offs

GPT-5 / Claude Opus
Mid-tier APIs
Managed open-weight (Groq)
Self-hosted (vLLM)
Proprietary (Closed API)
Best quality, zero ops, data leaves your infra
Open-Weight (Self-Hosted)
Full control, data stays local, ops burden

Most mature teams use both — proprietary for quality-critical tasks, open-weight for cost-sensitive volume

Proprietary — key attributes
Frontier quality
GPT-5, Claude Opus lead
Pay-per-token
No infra cost upfront
Data exits org
Provider's infrastructure
Zero ops
Provider manages everything
Open-Weight — key attributes
Nearly frontier
DeepSeek-R1 ≈ o1; Llama 4 ≈ 85% MMLU
Fixed infra cost
Cheaper at >10M tokens/day
Data stays local
No external transmission
High ops burden
GPU, serving, updates, monitoring

Open-weight eliminates data residency risk; proprietary eliminates operational burden

DimensionProprietary (Closed API)Open-Weight (Self-Hosted)
Quality ceilingHighest — GPT-5, Claude Opus, Gemini UltraApproaching but not yet at frontier (Llama 4: ~85-86% MMLU-Pro)
Cost modelPay per token; no upfront cost; predictable at low volumeInfrastructure cost (GPU/cloud); no per-token cost; cheaper at scale
Data privacyData sent to provider's infrastructureData never leaves your infrastructure
Regulatory complianceDepends on provider's certifications (SOC2, HIPAA, GDPR)Full control; can deploy in air-gapped or regulated environments
CustomisationLimited (fine-tuning available on some, but you don't own the base)Full fine-tuning, quantisation, LoRA; domain adaptation possible
LatencyNetwork round-trip; highly optimised servers; generally fastLocal inference: lower latency on fast hardware; varies by setup
Vendor lock-inHigh — pricing, availability, and behaviour controlled by providerNone — you own the deployment
Ops burdenNone — provider manages everythingHigh — GPU procurement, model serving, updates, monitoring
Model updatesAutomatic; can break existing prompts unexpectedlyYou control when (and whether) to upgrade

Cost Comparison at Scale

The cost crossover point depends on your query volume and model tier:

ScenarioProprietary APISelf-hosted open-weightBetter choice
Prototype / low volume (<100K tokens/day)~$0–5/dayGPU amortised: $5–30+/dayAPI
Medium volume (1M tokens/day)$25–250/day (tier dependent)$10–50/day (1–2 GPUs)Depends on tier
High volume (100M+ tokens/day)$2,500–25,000+/day$100–1,000/day (optimised cluster)Self-hosted strongly favoured

The self-hosted break-even typically falls somewhere between 10M–100M tokens/day depending on model size, GPU type, and whether you're on cloud or bare metal.

Data Privacy and Residency

With proprietary APIs, every prompt and completion transits the provider's infrastructure. Even with enterprise agreements (OpenAI Enterprise, Anthropic for Business, Google Vertex AI), key considerations are:

  • Training opt-out — Most enterprise tiers promise not to use your data for model training, but verify this in your contract
  • Log retention — Provider may retain logs for 30–90 days for abuse detection; understand what this means for sensitive data
  • Jurisdiction — Where are servers located? EU GDPR may require data to stay in the EU; some providers offer EU-hosted endpoints
  • HIPAA / SOC2 coverage — Check which tiers include the required compliance certifications for your industry

Self-hosted open-weight models eliminate these concerns entirely — inference runs in your own infrastructure; no data transmitted externally.

The Quality Gap in 2025–2026

For most tasks, the quality gap between frontier proprietary and top open-weight models has narrowed substantially:

  • Coding: Llama 3.1 70B and DeepSeek-V3 are competitive with GPT-4o on many coding benchmarks
  • Reasoning: DeepSeek-R1 matches o1 on AIME and MATH
  • General tasks: Llama 4 Maverick achieves ~85% MMLU-Pro vs GPT-5's ~90%+

However, proprietary frontier models still lead on:

  • The absolute hardest tasks (PhD-level GPQA, frontier reasoning)
  • Multimodal tasks (especially video and audio)
  • Instruction-following consistency and safety alignment
  • Very long-context tasks (1M tokens is rare in open-weight)

The Hybrid Strategy

Most mature AI teams use both — not one or the other:

Typical hybrid approach

  • Prototype with proprietary API — zero infrastructure, fastest time to value, best quality ceiling
  • Migrate cost-sensitive workloads to open-weight — classification, routing, simple generation, high-volume processing
  • Keep proprietary for hard tasks — complex reasoning, frontier quality requirements, multimodal
  • Route at inference time — classifier sends simple queries to local 7B model, complex queries to GPT-5 or Claude Opus

Open-Weight Ecosystem

Running open-weight models is significantly more accessible than it was even 2 years ago:

  • Ollama — One-command local model running; supports Llama, Mistral, DeepSeek, Phi, Qwen, and 100+ others; OpenAI-compatible API
  • Hugging Face — Central repository for model weights; 700K+ models; Transformers library for Python integration
  • vLLM — Production inference server; continuous batching; high throughput for multi-user serving
  • llama.cpp — CPU-based inference; runs quantised models on laptops and edge devices
  • LM Studio — Desktop GUI for local model management; good for non-technical users

Checklist: Do You Understand This?

  • What is the difference between "open-weight" and "open-source"?
  • At what approximate token volume does self-hosting become cost-competitive?
  • Name three data privacy advantages of self-hosted open-weight models.
  • On what types of tasks does the quality gap between proprietary and open-weight models remain largest?
  • Describe a hybrid routing strategy and when it makes sense.
  • Why should you check the licence of an open-weight model before production use?