RAG (Retrieval-Augmented Generation)

RAG is the pattern of grounding AI responses in documents you control — instead of relying on the model's training data alone. This section covers when RAG is the right choice, how to build a pipeline that retrieves reliably, common failure modes, and how to evaluate whether your RAG system is actually working.

In This Section

When RAG is Needed

The conditions that make RAG the right choice — and the alternatives (fine-tuning, long context) that may be better fits.

Chunking & Embeddings

How to split documents into retrievable chunks and embed them for semantic search — strategies and tradeoffs.

Hybrid Search & Reranking

Combining vector search with keyword search, and using reranking to improve the quality of retrieved chunks.

Citations & Provenance

How to surface source attribution in RAG responses so users can verify where answers came from.

RAG Pitfalls

The most common ways RAG pipelines fail — retrieval failures, context stuffing, and hallucination despite retrieval.

RAG Evaluation

Metrics and test approaches for measuring retrieval quality, answer faithfulness, and end-to-end RAG pipeline performance.