🧠All Things AI — by Subhojit DeyAll Things AI
🌱Start Here🔧Build with AIDaily StackDevelopersVibe CodingOthersLocal🏢Industry🛡️Legal🔬Deep Dive📰News
🧠 All Things AI
🌱🧠🔧⚡⚡🤖✨🔍🔶🎯💜⚡🪟🦙🤗🦞🔁🌊✕🔀🛠️🏢🛡️✅🏭🔬📰
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
🔬Deep Dive
Math Foundations
Neural Networks
Transformer Architecture
Scaling
LLM Pre-training
Alignment Techniques
Reasoning Internals
Interpretability
Model Architectures
Hardware & Compute
Fine-tuning & Adaptation
Research Skills
AI Economics & Impact
Deep DiveAlignment Techniques

Alignment Techniques

Methods for training AI systems to follow instructions and behave in alignment with human preferences — from RLHF to direct preference optimization.

In This Section

RLHF — Mechanics & Pipeline

The SFT → reward model → PPO pipeline, KL penalty, and InstructGPT.

DPO — Direct Preference Optimization

How DPO replaces PPO with a simpler loss function derived from the same objective.

Constitutional AI & Self-Critique

Training models to critique and revise outputs against a written constitution.

Reward Modeling

How reward models are trained, Goodhart's Law, and process vs outcome reward models.

Previous← Data CurationNextRLHF →

Page built: 01 Jun 2026