Scaling
The laws governing how model capability grows with compute, data, and parameters — and what happens at the frontier.
In This Section
Scaling Laws — Compute, Data, Parameters
Kaplan et al. power laws, FLOP budgeting, and predicting performance before training.
Chinchilla & Optimal Training
Compute-optimal training, the 20 tokens/parameter rule, and deliberate overtraining.
Emergent Abilities & Phase Transitions
Capabilities that appear suddenly at scale and why they are hard to predict.