Advanced

Researcher Path

For people who want to understand how AI systems actually work — from the math up through transformers, training, alignment, and hardware. This is the technical foundation path.

Steps 12

Est. time ~4–5 hours

Prerequisites Maths + programming comfort

The Path

Reasoning Models and Test-Time Compute

Start with the frontier: understanding how modern reasoning models (o3, Gemini 2.5, DeepSeek R1) differ from standard LLMs sets the research context.

12 min

Linear Algebra for AI

Vectors, matrices, dot products, projections — the language that transformers are written in. You need this before attention makes sense.

20 min

Probability and Statistics for AI

Distributions, Bayes' theorem, cross-entropy loss — every training objective and evaluation metric is built on this.

20 min

Feedforward Neural Networks

Before transformers, understand the building blocks: neurons, layers, activations, backpropagation. The foundation for everything above it.

20 min

The Attention Mechanism

Self-attention is the core innovation of the transformer. Understand queries, keys, values, and scaled dot-product attention from first principles.

25 min

The Transformer Block

Multi-head attention + feedforward + layer norm + residuals — how they fit together, why each component is there.

20 min

Pre-training Objectives

Next-token prediction, masked language modeling, RLHF — how language models actually learn from data.

20 min

Scaling Laws

Chinchilla and the Kaplan scaling laws explain why larger models + more data = better performance — and where the limits are.

20 min

RLHF and Alignment

Reinforcement learning from human feedback is how raw pre-trained models become aligned assistants. The key technique in modern AI safety.

20 min

Chain-of-Thought and Reasoning

How prompting strategies and training objectives create emergent multi-step reasoning. The link between standard LLMs and modern reasoning models.

20 min

Mixture of Experts Architectures

MoE is the dominant architecture for frontier models in 2025 (GPT-4, Mixtral, Llama 4, DeepSeek). Understand sparse activation and routing.

20 min

GPU Architecture and AI Hardware

FLOP budgets, memory bandwidth, tensor cores, NVLink — why hardware constraints shape model design decisions.

20 min

After This Path

You now have a coherent bottom-up model of how modern AI systems are designed and trained. Good follow-on areas:

Full Deep Dive section — remaining subsections: interpretability, open-source movement, AI research skills
Alignment subsection — Constitutional AI, DPO, red teaming, interpretability
Fine-tuning — LoRA, QLoRA, PEFT, full fine-tuning vs RAG
Reasoning models — back to the frontier with your new foundation in place
Reasoning models section — o3, DeepSeek R1, Gemini 2.5 Pro in depth

Checklist: Do You Understand This?

Can you explain self-attention from first principles using queries, keys, and values?
Do you know why the transformer architecture was a breakthrough over RNNs?
Can you describe what happens during pre-training and what objective is being optimized?
Do you understand the Chinchilla scaling laws and what they say about optimal compute allocation?
Can you explain RLHF and why it matters for making pre-trained models useful?
Do you understand why Mixture of Experts architectures are dominant at the frontier in 2025?
Can you connect memory bandwidth to inference latency to model design choices?