Beginner

AI vs ML vs Deep Learning vs GenAI

These four terms get thrown around interchangeably, but they are not the same thing. They form a nested hierarchy — each one is a subset of the one above it. Understanding this hierarchy is the single most important mental model for everything else on this site.

The Hierarchy at a Glance

Think of it as concentric circles — each inner ring is a specialised subset of the one containing it:

Artificial Intelligence

Any system mimicking human intelligence — rules, search, or learning

Machine Learning

AI that learns patterns from data instead of following hand-written rules

Deep Learning

ML using multi-layer neural networks to learn hierarchical representations

Generative AI

Deep learning that creates new content — text, images, audio, code, video

Every GenAI system is deep learning. Every deep learning system is ML. Every ML system is AI. The reverse is not true.

Every generative AI system is a deep learning system. Every deep learning system is a machine learning system. And every machine learning system is an AI system. But the reverse is not true — there are AI systems that have nothing to do with machine learning.

Artificial Intelligence (AI)

Artificial Intelligence is the broadest term. It refers to any system that performs tasks that would normally require human intelligence — understanding language, recognizing images, making decisions, planning, or reasoning.

AI includes many approaches, spanning seven decades of research:

1950s–70s

Rule-based systems

If-then-else logic, expert systems

Search algorithms

Chess engines, planning

1980s–90s

Expert systems

Medicine, engineering knowledge bases

Early neural nets

Perceptrons, backprop discovered

2000s–10s

Classic ML

SVMs, random forests, gradient boosting

Deep learning

CNNs, ImageNet breakthrough (2012)

2017–now

Transformers

Attention is All You Need

Generative AI

LLMs, diffusion models, multimodal

AI approaches through history — each era built on the last

The key insight: AI is a goal (make machines intelligent), not a specific technology. The technology used to achieve that goal has changed dramatically over 70 years.

Machine Learning (ML)

Machine Learning is the subset of AI where systems learn from data rather than following hand-written rules. Instead of a programmer writing "if the email contains X, mark as spam," you give the system thousands of examples of spam and not-spam emails, and it figures out the patterns itself.

The Three Types of ML

Supervised

Labeled examples

Input → correct output

Use cases

Email spam, price prediction, diagnosis

Unsupervised

No labels

Find hidden structure in data

Use cases

Customer segments, anomaly detection

Reinforcement

Trial & error

Rewards and penalties shape behaviour

Use cases

AlphaGo, robotics, RLHF for LLMs

The three ML paradigms — supervised is the most common in production

Classic ML Algorithms

Before deep learning dominated, these algorithms powered most ML applications:

Linear/logistic regression — Simple, interpretable, still widely used
Decision trees & random forests — Great for tabular data
Support vector machines (SVMs) — Effective for classification with clear margins
k-nearest neighbors — Simple distance-based classification
Gradient boosting (XGBoost, LightGBM) — Still the gold standard for structured/tabular data in 2025

Important: for tabular data (spreadsheets, databases, CSV files), classic ML often still outperforms deep learning. Not everything needs a neural network.

Deep Learning (DL)

Deep Learning is the subset of ML that uses neural networks with many layers (hence "deep"). These layers learn increasingly abstract representations of the input data — from raw pixels to edges to shapes to objects, or from raw text to syntax to semantics to meaning.

Why "Deep" Works

A single-layer neural network (perceptron) can only learn simple linear patterns. Stacking many layers lets the network learn complex, hierarchical patterns that would be impossible to hand-code. The "depth" refers to the number of layers — modern models have hundreds or thousands.

What Made Deep Learning Take Off

Deep learning existed since the 1980s but only became practical around 2012 due to four converging factors:

Data

ImageNet, Common Crawl — massive labeled datasets from the internet

→

Compute

GPUs (made for gaming) turned out perfect for neural network math

→

Algorithms

Dropout, batch norm, better optimizers made training stable

→

Breakthrough

AlexNet 2012: 10.8% error gap vs 2nd place on ImageNet

All four factors converged around 2012 — removing any one would have delayed the revolution

Key Deep Learning Architectures

Vision

CNNs

2012–2020, image classification, object detection

ViT

Vision Transformer, 2020+

Language

RNN / LSTM

Sequential, struggled with long range

Transformer

2017 — attention processes all tokens at once

Generation

GANs

Generator vs discriminator, image synthesis

Diffusion

Stable Diffusion, DALL-E, Midjourney

Autoregressive

GPT-style token-by-token text generation

Architecture families by domain — transformers now dominate all three

Generative AI (GenAI)

Generative AI is the subset of deep learning focused on creating new content — text, images, audio, video, code, music, 3D models. Unlike earlier AI that classified or predicted, generative AI produces novel outputs that didn't exist before.

Generative AI by Modality

Text

LLMs

GPT-4, Claude, Gemini, LLaMA, Mistral

Code

Claude Code, Copilot, Cursor

Images

Diffusion

Stable Diffusion, DALL-E 3, Flux

Midjourney

Proprietary image gen

Audio

TTS

ElevenLabs, OpenAI TTS, Piper

Music

Suno, Udio

Video

Sora

OpenAI, text-to-video

Runway Gen-3

Video generation & editing

Kling

Kuaishou video gen

GenAI modalities in 2025 — text is most mature; video is the frontier

The Transformer Revolution

The 2017 paper "Attention Is All You Need" introduced the transformer architecture, which became the foundation of modern generative AI. Key innovations:

Self-attention — Each part of the input can attend to every other part, capturing long-range relationships
Parallelization — Unlike RNNs, transformers process entire sequences at once, making training much faster on GPUs
Scaling — Transformers improve predictably as you add more data, more parameters, and more compute (scaling laws)

This scalability is why the AI industry is investing billions in compute infrastructure — bigger transformers, trained on more data, consistently produce smarter models.

Common Points of Confusion

"Is ChatGPT artificial intelligence?"

Yes — it's all four at once. ChatGPT is an AI system, built with machine learning, specifically deep learning, specifically generative AI (an LLM). It sits in the innermost circle.

"Is a spam filter AI?"

Yes, if it uses ML (most modern ones do). A rule-based spam filter is also AI in the broadest sense, but it's not ML or deep learning.

"Is all AI generative?"

No. Most AI in production today is not generative — fraud detection, recommendation engines, search ranking, self-driving perception. Generative AI is the newest and most visible category, but it's a small slice of the total AI landscape.

"Do I need to understand ML math to use GenAI?"

No. You can be highly effective using LLMs, building RAG systems, and deploying AI products without understanding backpropagation or gradient descent. But knowing the conceptual hierarchy helps you make better decisions about what tools to use and when.

Practical Takeaway

When someone says "AI," ask yourself which layer they mean:

If they mean any intelligent automation → they mean AI broadly
If they mean learning from data → they mean ML
If they mean neural networks → they mean deep learning
If they mean creating text, images, or code → they mean generative AI

This distinction matters because it determines what skills you need, what tools to use, and what limitations to expect. A generative AI solution has very different failure modes than a classic ML classification model.

Checklist: Do You Understand This?

Can you explain the four terms to someone in 30 seconds?
Can you give an example of AI that is not machine learning?
Can you name the architecture behind modern LLMs?
Can you explain why deep learning took off around 2012?
Can you name three types of generative AI besides text?