🧠All Things AI — by Subhojit DeyAll Things AI
🌱Start Here🔧Build with AIDaily StackDevelopersVibe CodingOthersLocal🏢Industry🛡️Legal🔬Deep Dive📰News
🧠 All Things AI
🌱🧠🔧⚡⚡🤖✨🔍🔶🎯💜⚡🪟🦙🤗🦞🔁🌊✕🔀🛠️🏢🛡️✅🏭🔬📰
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
Deep DiveHardware & Compute

Hardware & Compute

The physical substrate of AI — GPU architecture, specialized accelerators, memory bottlenecks, and how large models are distributed across thousands of chips.

In This Section

GPU Architecture — CUDA, Cores, Memory Hierarchy

SMs, Tensor Cores, HBM, and why GPUs dominate AI workloads.

TPU vs GPU vs Custom Silicon

Google TPU, Groq LPU, Cerebras, Tenstorrent, and Apple Neural Engine.

Memory Bandwidth — The Real Bottleneck

The roofline model, arithmetic intensity, KV cache, and Flash Attention.

FLOPS, MFU & Compute Efficiency

FLOP counting, model FLOP utilization, and how to measure training efficiency.

Distributed Training

Data, tensor, and pipeline parallelism, ZeRO, and 3D parallelism.

Previous← DeepSeekNextGPU Architecture →

Page built: 01 Jun 2026