Beginner

What is RAG?

RAG — Retrieval-Augmented Generation — is a technique that gives Claude access to your documents at query time by searching them and including the relevant content in the context. It bridges the gap between Claude's static training knowledge and the live, specific information your use case requires.

The Knowledge Cutoff Problem

Claude's knowledge is frozen at its training cutoff. It knows nothing about:

Your internal documents, policies, and procedures
Your product documentation, support tickets, and knowledge base
Events and publications after its training cutoff date
Your proprietary data: contracts, customer records, financial reports

You could paste documents into the conversation, but that only works for small amounts of content. For a knowledge base with hundreds of documents, you need a systematic way to find and include only the relevant content for each question.

How RAG Works

RAG has two phases — an offline ingestion phase and an online query phase:

Documents

PDFs, web pages, DB

→

Chunk

Split into pieces

→

Embed

Convert to vectors

→

Vector Store

Index for search

Ingestion phase (offline — runs once)

User Query

Question or prompt

→

Embed Query

Same embed model

→

Similarity Search

Top-k chunks

→

Augment + Generate

Claude answers with context

Query phase (online — runs per question)

Ingestion (offline, runs once):

Load your documents (PDFs, web pages, database records)
Split them into chunks (paragraphs, sections, fixed-size windows)
Convert each chunk to an embedding vector using an embedding model
Store the vectors and original text in a vector database

Query (online, runs per question):

Receive the user's question
Embed the question using the same embedding model
Search the vector database for chunks most similar to the question embedding
Include the top-k retrieved chunks in Claude's context alongside the question
Claude generates an answer grounded in the retrieved content

When RAG Is the Right Choice

RAG is the right answer when:

Your knowledge base is too large to fit in a single context window
Your documents change frequently — a RAG system updates when you re-ingest, without retraining
You need source attribution — retrieved chunks show exactly where the answer came from
Your use case is information retrieval: Q&A over docs, search, customer support

RAG is not the right answer when:

You want to change Claude's writing style or output format — use fine-tuning or system prompts
Your entire knowledge base fits comfortably in the context window — just include it directly
You need Claude to learn new capabilities — RAG adds knowledge, not skills

What RAG Cannot Do

RAG improves the accuracy of answers over your documents, but it does not eliminate errors. It retrieves — Claude generates. Errors can come from both:

Retrieval failures: The wrong chunks are returned (poor embedding quality, bad chunking, vague query). Claude answers based on the wrong context.
Generation errors: Claude misinterprets or misrepresents the retrieved content — hallucination over retrieved text, though rarer than hallucination without context.
Coverage gaps: The answer is not in any retrieved chunk because the document was not ingested or the question does not match the available content.

Always include source attribution in RAG outputs so users can verify the answer against the original document.

Checklist: Do You Understand This?

RAG = retrieve relevant documents, include in context, generate answer — solves the static knowledge cutoff problem
Two phases: offline ingestion (load → chunk → embed → store) and online query (embed → search → augment → generate)
Use RAG for large, frequently updated, or proprietary knowledge bases that require source attribution
Don't use RAG to change Claude's behaviour/style, or for tiny knowledge bases that fit in context
RAG retrieves — Claude generates: errors can come from wrong retrieval or misinterpretation of retrieved content