Reasoning Internals

How modern AI systems reason — chain-of-thought, test-time compute scaling, and what we know about how o1 and DeepSeek-R1 were trained.

In This Section

Chain-of-Thought — Why It Works

Few-shot CoT, zero-shot CoT, self-consistency, and the faithfulness debate.

Test-Time Compute & Tree Search

Best-of-N, beam search, MCTS, and process reward models.

How o1/o3 Reasons — What We Know

Extended chain-of-thought with RL, benchmark results, and what 'reasoning' means.

DeepSeek-R1 Training Approach

GRPO, cold-start, rejection sampling, and distillation to smaller models.