Reasoning Internals
How modern AI systems reason — chain-of-thought, test-time compute scaling, and what we know about how o1 and DeepSeek-R1 were trained.
In This Section
Chain-of-Thought — Why It Works
Few-shot CoT, zero-shot CoT, self-consistency, and the faithfulness debate.
Test-Time Compute & Tree Search
Best-of-N, beam search, MCTS, and process reward models.
How o1/o3 Reasons — What We Know
Extended chain-of-thought with RL, benchmark results, and what 'reasoning' means.
DeepSeek-R1 Training Approach
GRPO, cold-start, rejection sampling, and distillation to smaller models.