Intermediate

RAG vs Fine-Tuning

RAG and fine-tuning are both ways to make Claude more useful for a specific domain, but they solve different problems. Understanding which to use — and when to combine them — prevents costly wrong choices.

What Each Approach Actually Does

RAG

Retrieves relevant documents at query time
Inserts retrieved content into the prompt
Claude's weights are unchanged — same base model
Knowledge lives outside the model, in a vector store
Knowledge can be updated without retraining

Fine-tuning

Trains the model on examples to adjust its behaviour
Knowledge and style are baked into model weights
No retrieval step at inference time
Updating requires re-running the training process
Not available for Claude via the Anthropic API (as of 2025)

RAG Is Best For

Proprietary documents: Internal policies, support articles, contracts, manuals — content that did not exist during training and should not be sent to a training pipeline
Frequently updated knowledge: Product documentation, news, pricing — re-ingest updated documents and the system reflects changes immediately
Auditability: RAG returns source chunks alongside the answer, so users can verify which document the answer came from
Large knowledge bases: Thousands of documents too large to fit in any context window
Multiple knowledge domains: Different retrieval indexes for different product lines, tenants, or contexts

Fine-Tuning Is Best For

Consistent output style/format: Teaching the model to always output JSON in a specific schema, or always write in a particular brand voice
Domain vocabulary: Getting the model to correctly use and understand specialised terminology (medical, legal, proprietary product names)
Behaviour change: Teaching the model to follow a specific reasoning process, refuse certain request types, or consistently apply rules that are hard to specify in a prompt
Latency-critical applications: Fine-tuned models can remove the retrieval round-trip

Note: Anthropic does not currently offer fine-tuning for Claude models via the public API. Fine-tuning is available for open-weight models (Llama, Mistral) or via some cloud providers' managed services.

When to Combine Both

The two approaches are complementary, not mutually exclusive:

Fine-tune for style, RAG for knowledge: Fine-tune a model to always output structured JSON in your format, then use RAG to supply the domain-specific facts it uses to populate that structure
Domain adaptation + retrieval: Fine-tune on domain vocabulary so the embedding and retrieval quality improves, then use RAG to supply the specific document content

Cost Comparison

RAG ongoing cost: Embedding inference for new documents + vector database hosting + retrieval at query time (adds latency and tokens per query)
Fine-tuning one-time cost: Training compute — substantial upfront, especially for large models; re-runs needed for each update cycle
Fine-tuning ongoing benefit: No retrieval cost per query; can use smaller base models for equivalent quality on narrow tasks

For most teams starting out, RAG has a much lower barrier: no training data curation, no training job, no model versioning overhead. Fine-tuning makes sense when you have training data, a stable target behaviour, and the resources to manage the training pipeline.

5-Question Decision Framework

Is the content proprietary or updated frequently? → RAG
Do you need source attribution? → RAG
Do you want to change output format/style consistently? → Fine-tuning (or system prompt)
Is the knowledge base larger than 200k tokens? → RAG
Do you have 1,000+ labelled input/output examples? → Fine-tuning might be worth it; otherwise start with RAG + prompting

Checklist: Do You Understand This?

RAG: retrieval at query time — best for proprietary docs, frequently updated content, auditability, large knowledge bases
Fine-tuning: changes model weights — best for consistent style, domain vocabulary, stable behaviour change
Fine-tuning for Claude is not available via Anthropic's public API (as of 2025)
Combine both: fine-tune for style/format, RAG for knowledge supply
Default path: start with RAG + prompting; only consider fine-tuning when you have sufficient labelled data and clear behaviour targets