Hardware & Compute

The physical substrate of AI — GPU architecture, specialized accelerators, memory bottlenecks, and how large models are distributed across thousands of chips.

In This Section

GPU Architecture — CUDA, Cores, Memory Hierarchy

SMs, Tensor Cores, HBM, and why GPUs dominate AI workloads.

TPU vs GPU vs Custom Silicon

Google TPU, Groq LPU, Cerebras, Tenstorrent, and Apple Neural Engine.

Memory Bandwidth — The Real Bottleneck

The roofline model, arithmetic intensity, KV cache, and Flash Attention.

FLOPS, MFU & Compute Efficiency

FLOP counting, model FLOP utilization, and how to measure training efficiency.

Distributed Training

Data, tensor, and pipeline parallelism, ZeRO, and 3D parallelism.