Beginner

What is RAG?

RAG — Retrieval-Augmented Generation — is a technique that gives Claude access to your documents at query time by searching them and including the relevant content in the context. It bridges the gap between Claude's static training knowledge and the live, specific information your use case requires.

The Knowledge Cutoff Problem

Claude's knowledge is frozen at its training cutoff. It knows nothing about:

  • Your internal documents, policies, and procedures
  • Your product documentation, support tickets, and knowledge base
  • Events and publications after its training cutoff date
  • Your proprietary data: contracts, customer records, financial reports

You could paste documents into the conversation, but that only works for small amounts of content. For a knowledge base with hundreds of documents, you need a systematic way to find and include only the relevant content for each question.

How RAG Works

RAG has two phases — an offline ingestion phase and an online query phase:

Documents
PDFs, web pages, DB
Chunk
Split into pieces
Embed
Convert to vectors
Vector Store
Index for search

Ingestion phase (offline — runs once)

User Query
Question or prompt
Embed Query
Same embed model
Similarity Search
Top-k chunks
Augment + Generate
Claude answers with context

Query phase (online — runs per question)

Ingestion (offline, runs once):

  1. Load your documents (PDFs, web pages, database records)
  2. Split them into chunks (paragraphs, sections, fixed-size windows)
  3. Convert each chunk to an embedding vector using an embedding model
  4. Store the vectors and original text in a vector database

Query (online, runs per question):

  1. Receive the user's question
  2. Embed the question using the same embedding model
  3. Search the vector database for chunks most similar to the question embedding
  4. Include the top-k retrieved chunks in Claude's context alongside the question
  5. Claude generates an answer grounded in the retrieved content

When RAG Is the Right Choice

RAG is the right answer when:

  • Your knowledge base is too large to fit in a single context window
  • Your documents change frequently — a RAG system updates when you re-ingest, without retraining
  • You need source attribution — retrieved chunks show exactly where the answer came from
  • Your use case is information retrieval: Q&A over docs, search, customer support

RAG is not the right answer when:

  • You want to change Claude's writing style or output format — use fine-tuning or system prompts
  • Your entire knowledge base fits comfortably in the context window — just include it directly
  • You need Claude to learn new capabilities — RAG adds knowledge, not skills

What RAG Cannot Do

RAG improves the accuracy of answers over your documents, but it does not eliminate errors. It retrieves — Claude generates. Errors can come from both:

  • Retrieval failures: The wrong chunks are returned (poor embedding quality, bad chunking, vague query). Claude answers based on the wrong context.
  • Generation errors: Claude misinterprets or misrepresents the retrieved content — hallucination over retrieved text, though rarer than hallucination without context.
  • Coverage gaps: The answer is not in any retrieved chunk because the document was not ingested or the question does not match the available content.

Always include source attribution in RAG outputs so users can verify the answer against the original document.

Checklist: Do You Understand This?

  • RAG = retrieve relevant documents, include in context, generate answer — solves the static knowledge cutoff problem
  • Two phases: offline ingestion (load → chunk → embed → store) and online query (embed → search → augment → generate)
  • Use RAG for large, frequently updated, or proprietary knowledge bases that require source attribution
  • Don't use RAG to change Claude's behaviour/style, or for tiny knowledge bases that fit in context
  • RAG retrieves — Claude generates: errors can come from wrong retrieval or misinterpretation of retrieved content

Page built: 01 Jun 2026