Beginner

Hallucinations & Failure Modes

LLMs will confidently tell you things that are completely wrong. They'll cite papers that don't exist, invent statistics, and fabricate plausible-sounding but false details. This is not a bug that will be fixed — it's a fundamental property of how these models work. Understanding failure modes is the most important skill for using AI safely.

Uncertain but Honest

Honest about uncertainty — trustworthy

Confident but Wrong

States falsehoods as facts — dangerous

Good AI response on unknown topic

Vague answer

Plausible-sounding but unverified

Fabricated citation, fake statistic

Hallucinations are not random noise — AI generates the most plausible-sounding text, which can be very wrong with total confidence

What Is a Hallucination?

A hallucination is when an LLM generates output that is factually incorrect, fabricated, or nonsensical — but presents it with the same confidence as correct information.

The term is imperfect (the model isn't "seeing things"), but it's the standard industry term. What's actually happening:

The model predicts tokens based on statistical patterns, not factual lookup
It has no internal concept of "truth" — only "what text would typically follow this text?"
When the training data doesn't clearly determine the answer, the model fills in plausible-sounding text
It cannot distinguish between what it "knows" confidently and what it's guessing

Critical point: The model does not know when it's hallucinating. It does not experience uncertainty the way humans do. A hallucinated answer looks identical to a correct one — same confidence, same fluency, same formatting.

Types of Hallucinations

Factual Hallucinations

The model states something that is simply false:

Inventing historical events that never happened
Stating wrong numbers, dates, or statistics
Attributing quotes to the wrong people
Describing features of products that don't exist

Citation Hallucinations

The model fabricates references:

Inventing academic paper titles, authors, and journals
Creating plausible-sounding URLs that lead nowhere
Citing real authors with fake paper titles (or vice versa)
Generating DOIs that don't exist

Example

"According to Smith et al. (2023) in their paper 'Attention Mechanisms in Transformer-based Language Models' published in Nature Machine Intelligence..."

This paper, these authors, and this specific publication may all be fabricated. The model constructed something that looks like a real citation.

Logical Hallucinations

The model's reasoning is flawed even when individual facts are correct:

Drawing invalid conclusions from valid premises
Making math errors while showing confident step-by-step work
Contradicting itself within the same response
Applying rules incorrectly (especially in legal, medical, or financial contexts)

Instruction Hallucinations

The model does something different from what you asked:

Including items you explicitly told it to exclude
Generating more or fewer items than requested
Using a format you didn't ask for
Answering a slightly different question than the one asked

Other Failure Modes (Beyond Hallucination)

Sycophancy

The model agrees with you even when you're wrong. If you say "2+2=5, right?" some models will agree, explain why you're correct, and elaborate on your incorrect premise. This happens because models are trained to be helpful and agreeable.

Over-refusal

The model refuses legitimate requests because it misidentifies them as harmful. Asking about chemistry for a school project might trigger safety filters. This is the opposite problem from hallucination — the model is being too cautious.

Knowledge Cutoff Blindness

Models have a training data cutoff date. They may confidently discuss events after their cutoff as if they know about them — blending real older information with fabricated recent details.

Anchoring & Priming

The model's output is heavily influenced by what's in the prompt. If your prompt contains incorrect information, the model tends to incorporate and reinforce those errors rather than correct them.

Repetition & Loops

Models sometimes get stuck repeating phrases, lists, or patterns. This is more common at low temperatures, with long outputs, or when the model is uncertain.

Format Compliance Failures

When asked to output JSON, the model might produce almost-valid JSON with trailing commas or unescaped characters. Code generation may produce code that looks right but has subtle syntax errors.

Why Do Models Hallucinate?

Training objective — Models are trained to generate plausible text, not verified facts. Plausible and true are not the same thing.
No retrieval mechanism — The model cannot look things up. It can only generate from learned patterns. If it wasn't trained on the answer, it guesses.
Compression of knowledge — Billions of facts are compressed into model weights. Some facts get "blurred" or mixed with similar facts during compression.
No confidence calibration — The model has no reliable way to say "I'm not sure about this." It can be prompted to express uncertainty, but this is performed, not felt.
Training on incorrect data — The internet contains errors. Models learn those errors along with correct information.

Detecting Hallucinations

Verify specific claims — Check names, dates, numbers, URLs, and citations against authoritative sources. Never trust a model-generated citation without looking it up.
Ask for sources — Then check if those sources actually exist and say what the model claims they say.
Ask the same question differently — If the model gives different answers to the same question rephrased, at least one answer is likely wrong.
Look for excessive confidence — If the model gives very precise numbers or very specific claims about obscure topics, be suspicious.
Check for internal contradictions — Read the full response. Does it contradict itself between paragraphs?
Test with known answers — Before trusting the model on questions you don't know, test it on questions you do know. This calibrates your trust.

Mitigating Hallucinations

Prompt Strategies

"If you're not sure, say so" — Explicitly instruct the model to express uncertainty
"Only use information from the provided context" — Constrain the model to your data (RAG pattern)
"Cite specific sources for each claim" — Forces the model to ground claims (though citations may still be fabricated)
"Think step by step" — Reduces logical errors by forcing explicit reasoning
Provide reference material — Give the model the facts it needs rather than relying on its training

System Strategies

RAG (Retrieval-Augmented Generation) — Ground the model's responses in retrieved documents, reducing reliance on parametric memory
Multi-model verification — Ask multiple models and compare answers. Disagreements flag potential hallucinations.
Human-in-the-loop — For high-stakes decisions, always have a human verify before acting
Confidence scoring — Some systems can estimate how likely the model is to be correct (though this is imperfect)
Low temperature — Reduces creative fabrication but doesn't eliminate factual errors

When to Trust, When to Verify

Task	Trust Level	Verification Needed
Brainstorming ideas	High	Minimal — ideas don't need to be "true"
Drafting text / rewriting	High	Light review for tone and accuracy
Explaining well-known concepts	Medium-High	Spot-check unfamiliar claims
Writing code	Medium	Always test the code; review logic
Specific facts, numbers, dates	Low	Always verify against primary sources
Legal/medical/financial advice	Very Low	Always consult a professional
Citations and references	Very Low	Always look up every citation

Will Hallucinations Be "Fixed"?

Hallucination rates are decreasing with each model generation, but they will likely never reach zero. Here's why:

The fundamental architecture (next-token prediction) doesn't distinguish fact from fiction
Perfect factual accuracy would require perfect training data, which doesn't exist
The boundary between "creative generation" and "hallucination" is context-dependent

The industry approach is not to eliminate hallucinations but to build systems around LLMs that detect, constrain, and mitigate them — RAG, guardrails, verification loops, and human oversight.

Checklist: Do You Understand This?

Can you explain why LLMs hallucinate (in one sentence)?
Can you name four types of hallucination?
What is sycophancy and why does it happen?
Can you list three strategies to reduce hallucinations?
For which tasks should you always verify model output?