Beginner

What is an LLM?

A Large Language Model (LLM) is the engine behind ChatGPT, Claude, Gemini, and every major AI assistant. Understanding what it is — and what it is not — is fundamental to using AI effectively.

The One-Sentence Explanation

An LLM is a neural network trained on enormous amounts of text that predicts the most likely next word (token) given everything that came before it.

That's it. Every response from ChatGPT, every code suggestion from Copilot, every summary from Claude — it all comes down to "what word should come next?" repeated thousands of times.

How It Works (No Math Required)

Pre-training — Reading the Internet

The model reads billions of pages of text — books, websites, Wikipedia, code, papers, forums. It learns to predict what comes next, absorbing language patterns, world knowledge, reasoning patterns, code syntax, and format conventions.

Fine-tuning — Learning to Be Helpful

A pre-trained model predicts text but doesn't follow instructions. Fine-tuning bridges this gap.

•Supervised fine-tuning (SFT) — Thousands of examples of good instruction-following: 'User asks X, good assistant responds with Y'
•RLHF — Human raters rank multiple responses; the model learns to prefer what humans rate higher
•Constitutional AI — Claude uses principles to self-evaluate outputs, reducing need for human raters

Inference — Generating Responses

When you type a message: your text → tokens → model processes all tokens through layers → probability distribution over next token → sample a token → append → repeat until response complete. This is why text streams one token at a time.

Every LLM you use today has gone through all three stages

Key insight: The model does not "understand" in the human sense. It has learned extremely sophisticated statistical patterns about how language works. Whether this constitutes "understanding" is a philosophical debate — but for practical purposes, the distinction rarely matters.

The Token Generation Loop

Your message

Text input

→

Tokenize

Split into ~word pieces

→

Process layers

Attention + FFN

→

Next token probabilities

Entire vocabulary scored

→

Sample token

Temperature controls randomness

→

Repeat

Until end-of-sequence

The same loop repeats for every token in the response — from 1 to thousands

Parameters, Weights, and Model Sizes

When you hear "GPT-4 has over a trillion parameters" or "LLaMA 70B," what does that mean?

Parameters are the numbers the model has learned during training. They encode the model's "knowledge" — the patterns, facts, and relationships it has absorbed from training data.
Weights are the same thing — the terms are used interchangeably. Each weight is a number (typically a decimal) stored in the model file.
Model size (e.g., 7B, 70B, 405B) — The "B" stands for billion parameters. Larger models generally perform better but require more memory and compute to run.

Size vs. Performance Tradeoffs

Size Category	Parameters	Examples	Can Run Locally?	Typical Use
Small (SLM)	1-7B	Phi-3 Mini, Gemma 2B, LLaMA 3.2 3B	Yes, even on phones	Simple tasks, on-device, edge
Medium	7-30B	LLaMA 3 8B, Mistral 7B, DeepSeek 7B	Yes, with good GPU	General tasks, coding, chat
Large	30-100B	LLaMA 3 70B, Mixtral 8x22B	Needs high-end GPU(s)	Complex reasoning, professional use
Frontier	100B+	GPT-4, Claude Opus, Gemini Ultra	Cloud only	Hardest tasks, state-of-the-art

Open vs. Closed Models

This is one of the most important distinctions in the current AI landscape:

Open-Weight

Download weights, run yourself, full control

Closed (Proprietary)

API or chat only, company controls weights

LLaMA 3 405B

Mistral / DeepSeek

Gemma / Phi

Gemini Flash

GPT-4o / Claude

Open-weight models are catching up — DeepSeek R1 matches frontier closed models at a fraction of the cost

Open-Weight Models

Download and run on your own hardware
Data never leaves your environment
No per-token cost (just compute)
Full control — fine-tune, modify, quantize
Examples: LLaMA 3, Mistral, DeepSeek, Gemma, Phi

Closed (Proprietary) Models

Access via API or chat interface only
Highest peak capability today
Zero infrastructure — just API keys
Regular updates without your effort
Examples: GPT-4o, Claude, Gemini Ultra, Grok

The Practical Choice

Most teams use closed models for prototyping and hard tasks (best quality, zero setup) and open models for production at scale (predictable cost, data control). Many use both — routing easy queries to small open models and hard queries to frontier closed models.

What an LLM is NOT

Common misconceptions:

It is not a search engine. It does not look things up in real-time. Its knowledge comes from training data (with a cutoff date). It can be connected to search, but that's an add-on (RAG), not a built-in feature.
It is not a database. It cannot reliably recall specific facts. It may "know" that Paris is the capital of France but confuse the population of a small city.
It is not deterministic. The same prompt can produce different outputs due to sampling randomness (temperature). This is a feature, not a bug — but it means you cannot rely on exact reproducibility.
It does not think step-by-step (unless asked). By default, it generates the most probable continuation. Asking it to "think step by step" or using chain-of-thought prompting can dramatically improve reasoning quality.
It does not have memory across conversations (by default). Each conversation starts fresh unless the system is built to persist memory.

The Major LLMs in 2025

Model Family	Company	Type	Strengths
GPT-4o / o3	OpenAI	Closed	Broad capability, multimodal, massive ecosystem, strong reasoning
Claude (Opus, Sonnet, Haiku)	Anthropic	Closed	Long context, coding, safety, instruction-following
Gemini 2.5 Pro / Flash	Google	Closed	1M+ token context, multimodal, thinking mode, Google integration
LLaMA 3 (8B, 70B, 405B)	Meta	Open-weight	Best open-weight general model, huge community
Mistral / Mixtral	Mistral AI	Open-weight	Efficient, strong coding, mixture-of-experts
DeepSeek (V3, R1)	DeepSeek	Open-weight	Strong reasoning, competitive with closed models at fraction of cost
Grok	xAI	Closed	Real-time knowledge (X/Twitter), less content filtering

The Mental Model to Keep

Think of an LLM as an extremely well-read collaborator who has read billions of documents but has no ability to look anything up, no persistent memory, and a tendency to be confidently wrong about details. It's brilliant at:

Drafting, rewriting, summarizing, and transforming text
Explaining concepts at any level of detail
Writing and debugging code
Brainstorming, structuring ideas, and finding patterns
Following complex multi-step instructions

It's unreliable at:

Precise factual recall (especially numbers, dates, URLs)
Consistent behavior across runs
Knowing what it doesn't know
Real-time or current information

Checklist: Do You Understand This?

Can you explain how an LLM generates text (next-token prediction)?
Can you describe the difference between pre-training and fine-tuning?
What does "70B parameters" mean?
Can you name two open-weight and two closed models?
Can you list three things an LLM is NOT?