🧠All Things AI — by Subhojit DeyAll Things AI
🌱Start Here🔧Build with AIDaily StackDevelopersVibe CodingOthersLocal🏢Industry🛡️Legal🔬Deep Dive📰News
🧠 All Things AI
🌱🧠🔧⚡⚡🤖✨🔍🔶🎯💜⚡🪟🦙🤗🦞🔁🌊✕🔀🛠️🏢🛡️✅🏭🔬📰
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
Deep DiveReasoning Internals

Reasoning Internals

How modern AI systems reason — chain-of-thought, test-time compute scaling, and what we know about how o1 and DeepSeek-R1 were trained.

In This Section

Chain-of-Thought — Why It Works

Few-shot CoT, zero-shot CoT, self-consistency, and the faithfulness debate.

Test-Time Compute & Tree Search

Best-of-N, beam search, MCTS, and process reward models.

How o1/o3 Reasons — What We Know

Extended chain-of-thought with RL, benchmark results, and what 'reasoning' means.

DeepSeek-R1 Training Approach

GRPO, cold-start, rejection sampling, and distillation to smaller models.

Previous← Reward ModelingNextChain of Thought →

Page built: 01 Jun 2026