RAG (Retrieval-Augmented Generation)
RAG is the pattern of grounding AI responses in documents you control — instead of relying on the model's training data alone. This section covers when RAG is the right choice, how to build a pipeline that retrieves reliably, common failure modes, and how to evaluate whether your RAG system is actually working.
In This Section
When RAG is Needed
The conditions that make RAG the right choice — and the alternatives (fine-tuning, long context) that may be better fits.
Chunking & Embeddings
How to split documents into retrievable chunks and embed them for semantic search — strategies and tradeoffs.
Hybrid Search & Reranking
Combining vector search with keyword search, and using reranking to improve the quality of retrieved chunks.
Citations & Provenance
How to surface source attribution in RAG responses so users can verify where answers came from.
RAG Pitfalls
The most common ways RAG pipelines fail — retrieval failures, context stuffing, and hallucination despite retrieval.
RAG Evaluation
Metrics and test approaches for measuring retrieval quality, answer faithfulness, and end-to-end RAG pipeline performance.