Knowledge Base Concepts
A knowledge base is the external information store that Claude draws on to answer questions beyond its training data. Knowledge comes in three fundamentally different forms — documents, structured data, and live APIs — each requiring a different integration approach.
Three Types of Knowledge
Document Knowledge
Unstructured or semi-structured text: PDFs, markdown files, Word documents, web pages, support articles, email threads, and meeting notes. Documents are the most common knowledge base content.
- Accessed via RAG: documents are chunked, embedded, stored in a vector database, and retrieved by semantic similarity at query time
- Best for: policies, procedures, documentation, FAQs, long-form knowledge
- Limitations: retrieval depends on semantic similarity — specific facts buried in a document may not surface if the query phrasing doesn't match well
Structured Knowledge
Data in tabular form: databases, spreadsheets, CSV files, JSON records, or any information with consistent fields and values. Structured data has schemas — rows, columns, types, relationships.
- Accessed via SQL tool or a MCP database server — Claude generates and executes queries
- Best for: customer records, product catalogs, financial data, inventory, analytics — any data where filtering, aggregation, and exact lookup matter
- Do not embed structured data into a vector store for row-level lookup — use a SQL tool instead. Vector embedding is appropriate only for text descriptions associated with structured records (e.g., product descriptions).
Live Knowledge
Real-time data from external services: current stock prices, weather, news feeds, system status, inventory levels, or any state that changes continuously.
- Accessed via API tools — Claude calls a tool that makes a live API request and returns current data
- Best for: anything time-sensitive where the answer changes minute-to-minute or day-to-day
- Cannot be pre-indexed or embedded — data must be fetched fresh at query time
Choosing the Right Type
Your knowledge is...
Long-form text or docs
→ RAG with vector store
Tabular / database records
→ SQL tool or MCP database server
Real-time / current state
→ Live API tool
Combining Knowledge Types
Most real knowledge bases combine multiple types. A customer support system might have:
- Product documentation (documents → RAG)
- Customer order history (structured → SQL tool)
- Current system status (live → API tool)
Claude can use all three in a single conversation. The user asks a question, Claude retrieves relevant docs via RAG, queries the customer record via SQL, and checks live system status via an API tool — then synthesises a complete answer. The key is defining each source as a separate tool with a clear description so Claude knows which to use for which type of question.
Knowledge in Context vs External Knowledge
Not all knowledge needs to be in a knowledge base. For information that is:
- Small enough to fit in the context window (under ~10,000 tokens): paste it directly into the system prompt — simpler and more reliable than RAG
- Rarely changing and short: include in the system prompt as static context
- Large, frequently updated, or confidential: build an external knowledge base
Checklist: Do You Understand This?
- Three knowledge types: documents (RAG), structured data (SQL tool), live data (API tool)
- Do not embed structured rows into a vector store — use SQL tools for tabular data lookups
- Live knowledge must be fetched fresh at query time — it cannot be pre-indexed
- Most production knowledge bases combine all three types — define each as a separate tool with clear descriptions
- Small, stable knowledge: paste directly into the system prompt — simpler than RAG