Advanced

Researcher Path

For people who want to understand how AI systems actually work — from the math up through transformers, training, alignment, and hardware. This is the technical foundation path.

Steps 12
Est. time ~4–5 hours
Prerequisites Maths + programming comfort

The Path

1
Reasoning Models and Test-Time Compute
Start with the frontier: understanding how modern reasoning models (o3, Gemini 2.5, DeepSeek R1) differ from standard LLMs sets the research context.
12 min
2
Linear Algebra for AI
Vectors, matrices, dot products, projections — the language that transformers are written in. You need this before attention makes sense.
20 min
3
Probability and Statistics for AI
Distributions, Bayes' theorem, cross-entropy loss — every training objective and evaluation metric is built on this.
20 min
4
Feedforward Neural Networks
Before transformers, understand the building blocks: neurons, layers, activations, backpropagation. The foundation for everything above it.
20 min
5
The Attention Mechanism
Self-attention is the core innovation of the transformer. Understand queries, keys, values, and scaled dot-product attention from first principles.
25 min
6
The Transformer Block
Multi-head attention + feedforward + layer norm + residuals — how they fit together, why each component is there.
20 min
7
Pre-training Objectives
Next-token prediction, masked language modeling, RLHF — how language models actually learn from data.
20 min
8
Scaling Laws
Chinchilla and the Kaplan scaling laws explain why larger models + more data = better performance — and where the limits are.
20 min
9
RLHF and Alignment
Reinforcement learning from human feedback is how raw pre-trained models become aligned assistants. The key technique in modern AI safety.
20 min
10
Chain-of-Thought and Reasoning
How prompting strategies and training objectives create emergent multi-step reasoning. The link between standard LLMs and modern reasoning models.
20 min
11
Mixture of Experts Architectures
MoE is the dominant architecture for frontier models in 2025 (GPT-4, Mixtral, Llama 4, DeepSeek). Understand sparse activation and routing.
20 min
12
GPU Architecture and AI Hardware
FLOP budgets, memory bandwidth, tensor cores, NVLink — why hardware constraints shape model design decisions.
20 min

After This Path

You now have a coherent bottom-up model of how modern AI systems are designed and trained. Good follow-on areas:

Checklist: Do You Understand This?

  • Can you explain self-attention from first principles using queries, keys, and values?
  • Do you know why the transformer architecture was a breakthrough over RNNs?
  • Can you describe what happens during pre-training and what objective is being optimized?
  • Do you understand the Chinchilla scaling laws and what they say about optimal compute allocation?
  • Can you explain RLHF and why it matters for making pre-trained models useful?
  • Do you understand why Mixture of Experts architectures are dominant at the frontier in 2025?
  • Can you connect memory bandwidth to inference latency to model design choices?

Page built: 01 Jun 2026