RAG vs Fine-Tuning
RAG and fine-tuning are both ways to make Claude more useful for a specific domain, but they solve different problems. Understanding which to use — and when to combine them — prevents costly wrong choices.
What Each Approach Actually Does
RAG
- Retrieves relevant documents at query time
- Inserts retrieved content into the prompt
- Claude's weights are unchanged — same base model
- Knowledge lives outside the model, in a vector store
- Knowledge can be updated without retraining
Fine-tuning
- Trains the model on examples to adjust its behaviour
- Knowledge and style are baked into model weights
- No retrieval step at inference time
- Updating requires re-running the training process
- Not available for Claude via the Anthropic API (as of 2025)
RAG Is Best For
- Proprietary documents: Internal policies, support articles, contracts, manuals — content that did not exist during training and should not be sent to a training pipeline
- Frequently updated knowledge: Product documentation, news, pricing — re-ingest updated documents and the system reflects changes immediately
- Auditability: RAG returns source chunks alongside the answer, so users can verify which document the answer came from
- Large knowledge bases: Thousands of documents too large to fit in any context window
- Multiple knowledge domains: Different retrieval indexes for different product lines, tenants, or contexts
Fine-Tuning Is Best For
- Consistent output style/format: Teaching the model to always output JSON in a specific schema, or always write in a particular brand voice
- Domain vocabulary: Getting the model to correctly use and understand specialised terminology (medical, legal, proprietary product names)
- Behaviour change: Teaching the model to follow a specific reasoning process, refuse certain request types, or consistently apply rules that are hard to specify in a prompt
- Latency-critical applications: Fine-tuned models can remove the retrieval round-trip
Note: Anthropic does not currently offer fine-tuning for Claude models via the public API. Fine-tuning is available for open-weight models (Llama, Mistral) or via some cloud providers' managed services.
When to Combine Both
The two approaches are complementary, not mutually exclusive:
- Fine-tune for style, RAG for knowledge: Fine-tune a model to always output structured JSON in your format, then use RAG to supply the domain-specific facts it uses to populate that structure
- Domain adaptation + retrieval: Fine-tune on domain vocabulary so the embedding and retrieval quality improves, then use RAG to supply the specific document content
Cost Comparison
- RAG ongoing cost: Embedding inference for new documents + vector database hosting + retrieval at query time (adds latency and tokens per query)
- Fine-tuning one-time cost: Training compute — substantial upfront, especially for large models; re-runs needed for each update cycle
- Fine-tuning ongoing benefit: No retrieval cost per query; can use smaller base models for equivalent quality on narrow tasks
For most teams starting out, RAG has a much lower barrier: no training data curation, no training job, no model versioning overhead. Fine-tuning makes sense when you have training data, a stable target behaviour, and the resources to manage the training pipeline.
5-Question Decision Framework
- Is the content proprietary or updated frequently? → RAG
- Do you need source attribution? → RAG
- Do you want to change output format/style consistently? → Fine-tuning (or system prompt)
- Is the knowledge base larger than 200k tokens? → RAG
- Do you have 1,000+ labelled input/output examples? → Fine-tuning might be worth it; otherwise start with RAG + prompting
Checklist: Do You Understand This?
- RAG: retrieval at query time — best for proprietary docs, frequently updated content, auditability, large knowledge bases
- Fine-tuning: changes model weights — best for consistent style, domain vocabulary, stable behaviour change
- Fine-tuning for Claude is not available via Anthropic's public API (as of 2025)
- Combine both: fine-tune for style/format, RAG for knowledge supply
- Default path: start with RAG + prompting; only consider fine-tuning when you have sufficient labelled data and clear behaviour targets