🧠All Things AI — by Subhojit DeyAll Things AI
🌱Start Here🔧Build with AIDaily StackDevelopersVibe CodingOthersLocal🏢Industry🛡️Legal🔬Deep Dive📰News
🧠 All Things AI
🌱🧠🔧⚡⚡🤖✨🔍🔶🎯💜⚡🪟🦙🤗🦞🔁🌊✕🔀🛠️🏢🛡️✅🏭🔬📰
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
Deep DiveModel Architectures

Model Architectures Deep Dive

Inside the architectures of today's frontier models — how GPT, Llama, Mixture of Experts, Mistral, and DeepSeek are designed.

In This Section

GPT Series — Architecture Evolution

GPT-1 through GPT-4: what changed, what scaled, and what the series established.

Llama 3 — Architecture & Design Choices

GQA, RoPE, training details, and why Llama became the open-model baseline.

Mixture of Experts (MoE) — How It Works

Routing, sparsity, load balancing, and the compute-vs-memory tradeoff.

Mistral & Mixtral Internals

Sliding window attention, GQA, and Mixtral 8x7B vs dense models.

DeepSeek Architecture & Training

MLA, DeepSeekMoE, FP8 training, and the $6M frontier model.

Previous← Circuits & FeaturesNextGPT Series →

Page built: 01 Jun 2026