Advanced
Researcher Path
For people who want to understand how AI systems actually work — from the math up through transformers, training, alignment, and hardware. This is the technical foundation path.
Steps 12
Est. time ~4–5 hours
Prerequisites Maths + programming comfort
The Path
1
Reasoning Models and Test-Time Compute
Start with the frontier: understanding how modern reasoning models (o3, Gemini 2.5, DeepSeek R1) differ from standard LLMs sets the research context.
12 min
2
Linear Algebra for AI
Vectors, matrices, dot products, projections — the language that transformers are written in. You need this before attention makes sense.
20 min
3
Probability and Statistics for AI
Distributions, Bayes' theorem, cross-entropy loss — every training objective and evaluation metric is built on this.
20 min
4
Feedforward Neural Networks
Before transformers, understand the building blocks: neurons, layers, activations, backpropagation. The foundation for everything above it.
20 min
5
The Attention Mechanism
Self-attention is the core innovation of the transformer. Understand queries, keys, values, and scaled dot-product attention from first principles.
25 min
6
The Transformer Block
Multi-head attention + feedforward + layer norm + residuals — how they fit together, why each component is there.
20 min
7
Pre-training Objectives
Next-token prediction, masked language modeling, RLHF — how language models actually learn from data.
20 min
8
Scaling Laws
Chinchilla and the Kaplan scaling laws explain why larger models + more data = better performance — and where the limits are.
20 min
9
RLHF and Alignment
Reinforcement learning from human feedback is how raw pre-trained models become aligned assistants. The key technique in modern AI safety.
20 min
10
Chain-of-Thought and Reasoning
How prompting strategies and training objectives create emergent multi-step reasoning. The link between standard LLMs and modern reasoning models.
20 min
11
Mixture of Experts Architectures
MoE is the dominant architecture for frontier models in 2025 (GPT-4, Mixtral, Llama 4, DeepSeek). Understand sparse activation and routing.
20 min
12
GPU Architecture and AI Hardware
FLOP budgets, memory bandwidth, tensor cores, NVLink — why hardware constraints shape model design decisions.
20 min
After This Path
You now have a coherent bottom-up model of how modern AI systems are designed and trained. Good follow-on areas:
- Full Deep Dive section — remaining subsections: interpretability, open-source movement, AI research skills
- Alignment subsection — Constitutional AI, DPO, red teaming, interpretability
- Fine-tuning — LoRA, QLoRA, PEFT, full fine-tuning vs RAG
- Reasoning models — back to the frontier with your new foundation in place
- Reasoning models section — o3, DeepSeek R1, Gemini 2.5 Pro in depth
Checklist: Do You Understand This?
- Can you explain self-attention from first principles using queries, keys, and values?
- Do you know why the transformer architecture was a breakthrough over RNNs?
- Can you describe what happens during pre-training and what objective is being optimized?
- Do you understand the Chinchilla scaling laws and what they say about optimal compute allocation?
- Can you explain RLHF and why it matters for making pre-trained models useful?
- Do you understand why Mixture of Experts architectures are dominant at the frontier in 2025?
- Can you connect memory bandwidth to inference latency to model design choices?