DeepSeek-R1 & Open Reasoning
When DeepSeek released R1 in January 2025, it triggered a global reassessment of AI costs and accessibility: a fully open-weight reasoning model competitive with OpenAI's o1, available for free download. This page explains how it works, how it was trained, and what it means for builders.
Why DeepSeek-R1 Was a Shock
Before January 2025, the implicit assumption was that frontier reasoning capability required:
- Massive proprietary training runs (>$100M compute)
- Closed model weights (never released)
- API-only access with per-token pricing
DeepSeek-R1 broke all three assumptions simultaneously. It matched or exceeded o1 on major benchmarks, was released as open-weight (download freely), and was available via a public API at a fraction of OpenAI's pricing ($0.55/$2.19 per million input/output tokens vs o1's $15/$60).
The release triggered what analysts called a "global AI cost reset" — demonstrating that reasoning capability was not an exclusive property of the largest US labs.
Architecture: MoE with 671B Total / 37B Active
DeepSeek-R1 is built on DeepSeek-V3's architecture — a Mixture of Experts (MoE) model with:
- 671B total parameters across all expert networks
- 37B active parameters per forward pass — only a subset of experts activates for each token
This MoE design is why DeepSeek can achieve high capability at lower compute cost: the model is large in total parameters (giving it breadth of knowledge) but only activates a fraction of those parameters per token (keeping inference cost low).
How It Was Trained: RL Without Supervised CoT Data
The training approach is what made DeepSeek-R1 scientifically significant. Most reasoning models at the time were fine-tuned on human-written chain-of-thought examples. DeepSeek took a different path:
DeepSeek-R1-Zero: Pure Reinforcement Learning
R1-Zero was trained using only reinforcement learning (RL), with no supervised fine-tuning on reasoning examples. The reward signal was simple: is the final answer correct?
Remarkably, the model taught itself to reason — generating verification steps, backtracking when it went wrong, and exploring alternative approaches — purely because these behaviours improved its final answer accuracy. On AIME 2024, R1-Zero improved from 15.6% to 71% through RL alone.
This validated a key hypothesis: reasoning capability can emerge from reinforcement learning on outcome signals, without explicit instruction on how to reason.
DeepSeek-R1: Cold Start + RL + Refinement
The full R1 model extended this approach with additional stages:
- Cold start fine-tuning — A small set of human-written examples to bootstrap readable, well-formatted reasoning traces (R1-Zero's outputs were occasionally hard to read)
- RL training (GRPO) — Group Relative Policy Optimization, a more efficient RL framework than PPO, rewarding only correct final answers
- Rejection sampling — Filter out low-quality outputs and retrain on the best-quality reasoning traces
- Human preference alignment — A final stage incorporating human feedback to improve helpfulness and safety
| Benchmark | DeepSeek-R1 | OpenAI o1 |
|---|---|---|
| AIME 2024 (maths) | 79.8% | 79.2% |
| MATH-500 | 97.3% | 96.4% |
| Codeforces rating | 2,029 | 1,891 |
| GPQA Diamond (science) | 71.5% | 77.3% |
Distilled Models: Reasoning in Smaller Packages
DeepSeek also released six distilled models — smaller models trained to reproduce R1's reasoning patterns at much lower compute cost:
- 1.5B, 7B, 8B, 14B, 32B, 70B parameter sizes
- Based on Qwen2.5 and Llama3 architectures (not DeepSeek's MoE)
- The 7B distill outperforms GPT-4o on maths benchmarks
- The 70B distill rivals o1 on several benchmarks
These distilled models can run on consumer hardware — a 7B model runs on a modern gaming GPU (16GB VRAM), and 14B models run on higher-end consumer setups.
Running DeepSeek-R1 Locally
The full 671B model requires substantial hardware (multiple A100 80GB GPUs or equivalent). But the distilled models run locally:
| Model | Min VRAM | Recommended hardware | Quality level |
|---|---|---|---|
| deepseek-r1:7b | 8 GB | RTX 3080 / M1 Pro | Strong maths, moderate general reasoning |
| deepseek-r1:14b | 16 GB | RTX 4080 / M2 Max | Good general reasoning, strong coding |
| deepseek-r1:32b | 32 GB | RTX 4090 / M3 Ultra | Near o1 quality on many tasks |
| deepseek-r1:70b | 48 GB+ | 2× RTX 4090 / Mac Studio M3 Ultra | Close to full R1 for most use cases |
With Ollama, running the 7B model is simple:
ollama pull deepseek-r1:7b
ollama run deepseek-r1:7bVisible Reasoning Trace
Unlike OpenAI's o-series, which hides or summarises the thinking, DeepSeek-R1 outputs the full reasoning trace between <think> and</think> tags before the answer. This is valuable for:
- Debugging: understanding why the model reached a conclusion
- Learning: seeing how systematic reasoning approaches complex problems
- Auditing: verifying reasoning for high-stakes use cases
Data Residency Advantage
Running DeepSeek-R1 locally (or on your own cloud infrastructure) means reasoning happens without sending data to external APIs. This matters for:
- Source code that cannot leave your environment
- Customer data subject to GDPR or healthcare privacy regulations
- Internal business strategy or unreleased financial information
Limitations and Considerations
- Knowledge cutoff: DeepSeek-R1 has a 2024 training cutoff. It doesn't know about post-cutoff events without retrieval augmentation.
- Safety alignment differences: As a Chinese lab's model, R1's safety training differs from US models. It may decline politically sensitive Chinese topics more readily, and US-specific safety categories may behave differently. Evaluate for your use case.
- English writing quality: Strong on maths and coding; slightly weaker than GPT/Claude on nuanced long-form English prose.
- No native tool use: DeepSeek-R1 does not have built-in tool calling during reasoning (unlike o3/o4-mini). Tool use must be added at the orchestration layer.
Checklist: Do You Understand This?
- Why did DeepSeek-R1's January 2025 release cause a "global AI cost reset"?
- What is the key difference in R1's training compared to most reasoning models?
- What does MoE (Mixture of Experts) mean, and why does it matter for R1's cost?
- What are distilled models, and how do they differ from the full R1?
- What hardware do you need to run deepseek-r1:7b locally?
- What data residency advantage does self-hosting R1 provide?