🧠 All Things AI
Beginner

Hallucinations & Failure Modes

LLMs will confidently tell you things that are completely wrong. They'll cite papers that don't exist, invent statistics, and fabricate plausible-sounding but false details. This is not a bug that will be fixed — it's a fundamental property of how these models work. Understanding failure modes is the most important skill for using AI safely.

What Is a Hallucination?

A hallucination is when an LLM generates output that is factually incorrect, fabricated, or nonsensical — but presents it with the same confidence as correct information.

The term is imperfect (the model isn't "seeing things"), but it's the standard industry term. What's actually happening:

  • The model predicts tokens based on statistical patterns, not factual lookup
  • It has no internal concept of "truth" — only "what text would typically follow this text?"
  • When the training data doesn't clearly determine the answer, the model fills in plausible-sounding text
  • It cannot distinguish between what it "knows" confidently and what it's guessing

Critical point: The model does not know when it's hallucinating. It does not experience uncertainty the way humans do. A hallucinated answer looks identical to a correct one — same confidence, same fluency, same formatting.

Types of Hallucinations

Factual Hallucinations

The model states something that is simply false:

  • Inventing historical events that never happened
  • Stating wrong numbers, dates, or statistics
  • Attributing quotes to the wrong people
  • Describing features of products that don't exist

Citation Hallucinations

The model fabricates references:

  • Inventing academic paper titles, authors, and journals
  • Creating plausible-sounding URLs that lead nowhere
  • Citing real authors with fake paper titles (or vice versa)
  • Generating DOIs that don't exist

Example

"According to Smith et al. (2023) in their paper 'Attention Mechanisms in Transformer-based Language Models' published in Nature Machine Intelligence..."

This paper, these authors, and this specific publication may all be fabricated. The model constructed something that looks like a real citation.

Logical Hallucinations

The model's reasoning is flawed even when individual facts are correct:

  • Drawing invalid conclusions from valid premises
  • Making math errors while showing confident step-by-step work
  • Contradicting itself within the same response
  • Applying rules incorrectly (especially in legal, medical, or financial contexts)

Instruction Hallucinations

The model does something different from what you asked:

  • Including items you explicitly told it to exclude
  • Generating more or fewer items than requested
  • Using a format you didn't ask for
  • Answering a slightly different question than the one asked

Other Failure Modes (Beyond Hallucination)

Sycophancy

The model agrees with you even when you're wrong. If you say "2+2=5, right?" some models will agree, explain why you're correct, and elaborate on your incorrect premise. This happens because models are trained to be helpful and agreeable.

Over-refusal

The model refuses legitimate requests because it misidentifies them as harmful. Asking about chemistry for a school project might trigger safety filters. This is the opposite problem from hallucination — the model is being too cautious.

Knowledge Cutoff Blindness

Models have a training data cutoff date. They may confidently discuss events after their cutoff as if they know about them — blending real older information with fabricated recent details.

Anchoring & Priming

The model's output is heavily influenced by what's in the prompt. If your prompt contains incorrect information, the model tends to incorporate and reinforce those errors rather than correct them.

Repetition & Loops

Models sometimes get stuck repeating phrases, lists, or patterns. This is more common at low temperatures, with long outputs, or when the model is uncertain.

Format Compliance Failures

When asked to output JSON, the model might produce almost-valid JSON with trailing commas or unescaped characters. Code generation may produce code that looks right but has subtle syntax errors.

Why Do Models Hallucinate?

  • Training objective — Models are trained to generate plausible text, not verified facts. Plausible and true are not the same thing.
  • No retrieval mechanism — The model cannot look things up. It can only generate from learned patterns. If it wasn't trained on the answer, it guesses.
  • Compression of knowledge — Billions of facts are compressed into model weights. Some facts get "blurred" or mixed with similar facts during compression.
  • No confidence calibration — The model has no reliable way to say "I'm not sure about this." It can be prompted to express uncertainty, but this is performed, not felt.
  • Training on incorrect data — The internet contains errors. Models learn those errors along with correct information.

Detecting Hallucinations

  • Verify specific claims — Check names, dates, numbers, URLs, and citations against authoritative sources. Never trust a model-generated citation without looking it up.
  • Ask for sources — Then check if those sources actually exist and say what the model claims they say.
  • Ask the same question differently — If the model gives different answers to the same question rephrased, at least one answer is likely wrong.
  • Look for excessive confidence — If the model gives very precise numbers or very specific claims about obscure topics, be suspicious.
  • Check for internal contradictions — Read the full response. Does it contradict itself between paragraphs?
  • Test with known answers — Before trusting the model on questions you don't know, test it on questions you do know. This calibrates your trust.

Mitigating Hallucinations

Prompt Strategies

  • "If you're not sure, say so" — Explicitly instruct the model to express uncertainty
  • "Only use information from the provided context" — Constrain the model to your data (RAG pattern)
  • "Cite specific sources for each claim" — Forces the model to ground claims (though citations may still be fabricated)
  • "Think step by step" — Reduces logical errors by forcing explicit reasoning
  • Provide reference material — Give the model the facts it needs rather than relying on its training

System Strategies

  • RAG (Retrieval-Augmented Generation) — Ground the model's responses in retrieved documents, reducing reliance on parametric memory
  • Multi-model verification — Ask multiple models and compare answers. Disagreements flag potential hallucinations.
  • Human-in-the-loop — For high-stakes decisions, always have a human verify before acting
  • Confidence scoring — Some systems can estimate how likely the model is to be correct (though this is imperfect)
  • Low temperature — Reduces creative fabrication but doesn't eliminate factual errors

When to Trust, When to Verify

TaskTrust LevelVerification Needed
Brainstorming ideasHighMinimal — ideas don't need to be "true"
Drafting text / rewritingHighLight review for tone and accuracy
Explaining well-known conceptsMedium-HighSpot-check unfamiliar claims
Writing codeMediumAlways test the code; review logic
Specific facts, numbers, datesLowAlways verify against primary sources
Legal/medical/financial adviceVery LowAlways consult a professional
Citations and referencesVery LowAlways look up every citation

Will Hallucinations Be "Fixed"?

Hallucination rates are decreasing with each model generation, but they will likely never reach zero. Here's why:

  • The fundamental architecture (next-token prediction) doesn't distinguish fact from fiction
  • Perfect factual accuracy would require perfect training data, which doesn't exist
  • The boundary between "creative generation" and "hallucination" is context-dependent

The industry approach is not to eliminate hallucinations but to build systems around LLMs that detect, constrain, and mitigate them — RAG, guardrails, verification loops, and human oversight.

Checklist: Do You Understand This?

  • Can you explain why LLMs hallucinate (in one sentence)?
  • Can you name four types of hallucination?
  • What is sycophancy and why does it happen?
  • Can you list three strategies to reduce hallucinations?
  • For which tasks should you always verify model output?