🧠 All Things AI
Beginner

What is an LLM?

A Large Language Model (LLM) is the engine behind ChatGPT, Claude, Gemini, and every major AI assistant. Understanding what it is — and what it is not — is fundamental to using AI effectively.

The One-Sentence Explanation

An LLM is a neural network trained on enormous amounts of text that predicts the most likely next word (token) given everything that came before it.

That's it. Every response from ChatGPT, every code suggestion from Copilot, every summary from Claude — it all comes down to "what word should come next?" repeated thousands of times.

How It Works (No Math Required)

1
Pre-training — Reading the Internet

The model reads billions of pages of text — books, websites, Wikipedia, code, papers, forums. It learns to predict what comes next, absorbing language patterns, world knowledge, reasoning patterns, code syntax, and format conventions.

2
Fine-tuning — Learning to Be Helpful

A pre-trained model predicts text but doesn't follow instructions. Fine-tuning bridges this gap.

  • Supervised fine-tuning (SFT) — Thousands of examples of good instruction-following: 'User asks X, good assistant responds with Y'
  • RLHF — Human raters rank multiple responses; the model learns to prefer what humans rate higher
  • Constitutional AI — Claude uses principles to self-evaluate outputs, reducing need for human raters
3
Inference — Generating Responses

When you type a message: your text → tokens → model processes all tokens through layers → probability distribution over next token → sample a token → append → repeat until response complete. This is why text streams one token at a time.

Every LLM you use today has gone through all three stages

Key insight: The model does not "understand" in the human sense. It has learned extremely sophisticated statistical patterns about how language works. Whether this constitutes "understanding" is a philosophical debate — but for practical purposes, the distinction rarely matters.

The Token Generation Loop

Your message
Text input
Tokenize
Split into ~word pieces
Process layers
Attention + FFN
Next token probabilities
Entire vocabulary scored
Sample token
Temperature controls randomness
Repeat
Until end-of-sequence

The same loop repeats for every token in the response — from 1 to thousands

Parameters, Weights, and Model Sizes

When you hear "GPT-4 has over a trillion parameters" or "LLaMA 70B," what does that mean?

  • Parameters are the numbers the model has learned during training. They encode the model's "knowledge" — the patterns, facts, and relationships it has absorbed from training data.
  • Weights are the same thing — the terms are used interchangeably. Each weight is a number (typically a decimal) stored in the model file.
  • Model size (e.g., 7B, 70B, 405B) — The "B" stands for billion parameters. Larger models generally perform better but require more memory and compute to run.

Size vs. Performance Tradeoffs

Size CategoryParametersExamplesCan Run Locally?Typical Use
Small (SLM)1-7BPhi-3 Mini, Gemma 2B, LLaMA 3.2 3BYes, even on phonesSimple tasks, on-device, edge
Medium7-30BLLaMA 3 8B, Mistral 7B, DeepSeek 7BYes, with good GPUGeneral tasks, coding, chat
Large30-100BLLaMA 3 70B, Mixtral 8x22BNeeds high-end GPU(s)Complex reasoning, professional use
Frontier100B+GPT-4, Claude Opus, Gemini UltraCloud onlyHardest tasks, state-of-the-art

Open vs. Closed Models

This is one of the most important distinctions in the current AI landscape:

LLaMA 3 405B
Mistral / DeepSeek
Gemma / Phi
Gemini Flash
GPT-4o / Claude
Open-Weight
Download weights, run yourself, full control
Closed (Proprietary)
API or chat only, company controls weights

Open-weight models are catching up — DeepSeek R1 matches frontier closed models at a fraction of the cost

Open-Weight Models

  • Download and run on your own hardware
  • Data never leaves your environment
  • No per-token cost (just compute)
  • Full control — fine-tune, modify, quantize
  • Examples: LLaMA 3, Mistral, DeepSeek, Gemma, Phi

Closed (Proprietary) Models

  • Access via API or chat interface only
  • Highest peak capability today
  • Zero infrastructure — just API keys
  • Regular updates without your effort
  • Examples: GPT-4o, Claude, Gemini Ultra, Grok

The Practical Choice

Most teams use closed models for prototyping and hard tasks (best quality, zero setup) and open models for production at scale (predictable cost, data control). Many use both — routing easy queries to small open models and hard queries to frontier closed models.

What an LLM is NOT

Common misconceptions:

  • It is not a search engine. It does not look things up in real-time. Its knowledge comes from training data (with a cutoff date). It can be connected to search, but that's an add-on (RAG), not a built-in feature.
  • It is not a database. It cannot reliably recall specific facts. It may "know" that Paris is the capital of France but confuse the population of a small city.
  • It is not deterministic. The same prompt can produce different outputs due to sampling randomness (temperature). This is a feature, not a bug — but it means you cannot rely on exact reproducibility.
  • It does not think step-by-step (unless asked). By default, it generates the most probable continuation. Asking it to "think step by step" or using chain-of-thought prompting can dramatically improve reasoning quality.
  • It does not have memory across conversations (by default). Each conversation starts fresh unless the system is built to persist memory.

The Major LLMs in 2025

Model FamilyCompanyTypeStrengths
GPT-4o / o3OpenAIClosedBroad capability, multimodal, massive ecosystem, strong reasoning
Claude (Opus, Sonnet, Haiku)AnthropicClosedLong context, coding, safety, instruction-following
Gemini 2.5 Pro / FlashGoogleClosed1M+ token context, multimodal, thinking mode, Google integration
LLaMA 3 (8B, 70B, 405B)MetaOpen-weightBest open-weight general model, huge community
Mistral / MixtralMistral AIOpen-weightEfficient, strong coding, mixture-of-experts
DeepSeek (V3, R1)DeepSeekOpen-weightStrong reasoning, competitive with closed models at fraction of cost
GrokxAIClosedReal-time knowledge (X/Twitter), less content filtering

The Mental Model to Keep

Think of an LLM as an extremely well-read collaborator who has read billions of documents but has no ability to look anything up, no persistent memory, and a tendency to be confidently wrong about details. It's brilliant at:

  • Drafting, rewriting, summarizing, and transforming text
  • Explaining concepts at any level of detail
  • Writing and debugging code
  • Brainstorming, structuring ideas, and finding patterns
  • Following complex multi-step instructions

It's unreliable at:

  • Precise factual recall (especially numbers, dates, URLs)
  • Consistent behavior across runs
  • Knowing what it doesn't know
  • Real-time or current information

Checklist: Do You Understand This?

  • Can you explain how an LLM generates text (next-token prediction)?
  • Can you describe the difference between pre-training and fine-tuning?
  • What does "70B parameters" mean?
  • Can you name two open-weight and two closed models?
  • Can you list three things an LLM is NOT?