Hallucinations & Failure Modes
LLMs will confidently tell you things that are completely wrong. They'll cite papers that don't exist, invent statistics, and fabricate plausible-sounding but false details. This is not a bug that will be fixed — it's a fundamental property of how these models work. Understanding failure modes is the most important skill for using AI safely.
What Is a Hallucination?
A hallucination is when an LLM generates output that is factually incorrect, fabricated, or nonsensical — but presents it with the same confidence as correct information.
The term is imperfect (the model isn't "seeing things"), but it's the standard industry term. What's actually happening:
- The model predicts tokens based on statistical patterns, not factual lookup
- It has no internal concept of "truth" — only "what text would typically follow this text?"
- When the training data doesn't clearly determine the answer, the model fills in plausible-sounding text
- It cannot distinguish between what it "knows" confidently and what it's guessing
Critical point: The model does not know when it's hallucinating. It does not experience uncertainty the way humans do. A hallucinated answer looks identical to a correct one — same confidence, same fluency, same formatting.
Types of Hallucinations
Factual Hallucinations
The model states something that is simply false:
- Inventing historical events that never happened
- Stating wrong numbers, dates, or statistics
- Attributing quotes to the wrong people
- Describing features of products that don't exist
Citation Hallucinations
The model fabricates references:
- Inventing academic paper titles, authors, and journals
- Creating plausible-sounding URLs that lead nowhere
- Citing real authors with fake paper titles (or vice versa)
- Generating DOIs that don't exist
Example
"According to Smith et al. (2023) in their paper 'Attention Mechanisms in Transformer-based Language Models' published in Nature Machine Intelligence..."
This paper, these authors, and this specific publication may all be fabricated. The model constructed something that looks like a real citation.
Logical Hallucinations
The model's reasoning is flawed even when individual facts are correct:
- Drawing invalid conclusions from valid premises
- Making math errors while showing confident step-by-step work
- Contradicting itself within the same response
- Applying rules incorrectly (especially in legal, medical, or financial contexts)
Instruction Hallucinations
The model does something different from what you asked:
- Including items you explicitly told it to exclude
- Generating more or fewer items than requested
- Using a format you didn't ask for
- Answering a slightly different question than the one asked
Other Failure Modes (Beyond Hallucination)
Sycophancy
The model agrees with you even when you're wrong. If you say "2+2=5, right?" some models will agree, explain why you're correct, and elaborate on your incorrect premise. This happens because models are trained to be helpful and agreeable.
Over-refusal
The model refuses legitimate requests because it misidentifies them as harmful. Asking about chemistry for a school project might trigger safety filters. This is the opposite problem from hallucination — the model is being too cautious.
Knowledge Cutoff Blindness
Models have a training data cutoff date. They may confidently discuss events after their cutoff as if they know about them — blending real older information with fabricated recent details.
Anchoring & Priming
The model's output is heavily influenced by what's in the prompt. If your prompt contains incorrect information, the model tends to incorporate and reinforce those errors rather than correct them.
Repetition & Loops
Models sometimes get stuck repeating phrases, lists, or patterns. This is more common at low temperatures, with long outputs, or when the model is uncertain.
Format Compliance Failures
When asked to output JSON, the model might produce almost-valid JSON with trailing commas or unescaped characters. Code generation may produce code that looks right but has subtle syntax errors.
Why Do Models Hallucinate?
- Training objective — Models are trained to generate plausible text, not verified facts. Plausible and true are not the same thing.
- No retrieval mechanism — The model cannot look things up. It can only generate from learned patterns. If it wasn't trained on the answer, it guesses.
- Compression of knowledge — Billions of facts are compressed into model weights. Some facts get "blurred" or mixed with similar facts during compression.
- No confidence calibration — The model has no reliable way to say "I'm not sure about this." It can be prompted to express uncertainty, but this is performed, not felt.
- Training on incorrect data — The internet contains errors. Models learn those errors along with correct information.
Detecting Hallucinations
- Verify specific claims — Check names, dates, numbers, URLs, and citations against authoritative sources. Never trust a model-generated citation without looking it up.
- Ask for sources — Then check if those sources actually exist and say what the model claims they say.
- Ask the same question differently — If the model gives different answers to the same question rephrased, at least one answer is likely wrong.
- Look for excessive confidence — If the model gives very precise numbers or very specific claims about obscure topics, be suspicious.
- Check for internal contradictions — Read the full response. Does it contradict itself between paragraphs?
- Test with known answers — Before trusting the model on questions you don't know, test it on questions you do know. This calibrates your trust.
Mitigating Hallucinations
Prompt Strategies
- "If you're not sure, say so" — Explicitly instruct the model to express uncertainty
- "Only use information from the provided context" — Constrain the model to your data (RAG pattern)
- "Cite specific sources for each claim" — Forces the model to ground claims (though citations may still be fabricated)
- "Think step by step" — Reduces logical errors by forcing explicit reasoning
- Provide reference material — Give the model the facts it needs rather than relying on its training
System Strategies
- RAG (Retrieval-Augmented Generation) — Ground the model's responses in retrieved documents, reducing reliance on parametric memory
- Multi-model verification — Ask multiple models and compare answers. Disagreements flag potential hallucinations.
- Human-in-the-loop — For high-stakes decisions, always have a human verify before acting
- Confidence scoring — Some systems can estimate how likely the model is to be correct (though this is imperfect)
- Low temperature — Reduces creative fabrication but doesn't eliminate factual errors
When to Trust, When to Verify
| Task | Trust Level | Verification Needed |
|---|---|---|
| Brainstorming ideas | High | Minimal — ideas don't need to be "true" |
| Drafting text / rewriting | High | Light review for tone and accuracy |
| Explaining well-known concepts | Medium-High | Spot-check unfamiliar claims |
| Writing code | Medium | Always test the code; review logic |
| Specific facts, numbers, dates | Low | Always verify against primary sources |
| Legal/medical/financial advice | Very Low | Always consult a professional |
| Citations and references | Very Low | Always look up every citation |
Will Hallucinations Be "Fixed"?
Hallucination rates are decreasing with each model generation, but they will likely never reach zero. Here's why:
- The fundamental architecture (next-token prediction) doesn't distinguish fact from fiction
- Perfect factual accuracy would require perfect training data, which doesn't exist
- The boundary between "creative generation" and "hallucination" is context-dependent
The industry approach is not to eliminate hallucinations but to build systems around LLMs that detect, constrain, and mitigate them — RAG, guardrails, verification loops, and human oversight.
Checklist: Do You Understand This?
- Can you explain why LLMs hallucinate (in one sentence)?
- Can you name four types of hallucination?
- What is sycophancy and why does it happen?
- Can you list three strategies to reduce hallucinations?
- For which tasks should you always verify model output?