Beginner

Knowledge Base Concepts

A knowledge base is the external information store that Claude draws on to answer questions beyond its training data. Knowledge comes in three fundamentally different forms — documents, structured data, and live APIs — each requiring a different integration approach.

Three Types of Knowledge

Document Knowledge

Unstructured or semi-structured text: PDFs, markdown files, Word documents, web pages, support articles, email threads, and meeting notes. Documents are the most common knowledge base content.

Accessed via RAG: documents are chunked, embedded, stored in a vector database, and retrieved by semantic similarity at query time
Best for: policies, procedures, documentation, FAQs, long-form knowledge
Limitations: retrieval depends on semantic similarity — specific facts buried in a document may not surface if the query phrasing doesn't match well

Structured Knowledge

Data in tabular form: databases, spreadsheets, CSV files, JSON records, or any information with consistent fields and values. Structured data has schemas — rows, columns, types, relationships.

Accessed via SQL tool or a MCP database server — Claude generates and executes queries
Best for: customer records, product catalogs, financial data, inventory, analytics — any data where filtering, aggregation, and exact lookup matter
Do not embed structured data into a vector store for row-level lookup — use a SQL tool instead. Vector embedding is appropriate only for text descriptions associated with structured records (e.g., product descriptions).

Live Knowledge

Real-time data from external services: current stock prices, weather, news feeds, system status, inventory levels, or any state that changes continuously.

Accessed via API tools — Claude calls a tool that makes a live API request and returns current data
Best for: anything time-sensitive where the answer changes minute-to-minute or day-to-day
Cannot be pre-indexed or embedded — data must be fetched fresh at query time

Choosing the Right Type

Your knowledge is...

Long-form text or docs

→ RAG with vector store

Tabular / database records

→ SQL tool or MCP database server

Real-time / current state

→ Live API tool

Combining Knowledge Types

Most real knowledge bases combine multiple types. A customer support system might have:

Product documentation (documents → RAG)
Customer order history (structured → SQL tool)
Current system status (live → API tool)

Claude can use all three in a single conversation. The user asks a question, Claude retrieves relevant docs via RAG, queries the customer record via SQL, and checks live system status via an API tool — then synthesises a complete answer. The key is defining each source as a separate tool with a clear description so Claude knows which to use for which type of question.

Knowledge in Context vs External Knowledge

Not all knowledge needs to be in a knowledge base. For information that is:

Small enough to fit in the context window (under ~10,000 tokens): paste it directly into the system prompt — simpler and more reliable than RAG
Rarely changing and short: include in the system prompt as static context
Large, frequently updated, or confidential: build an external knowledge base

Checklist: Do You Understand This?

Three knowledge types: documents (RAG), structured data (SQL tool), live data (API tool)
Do not embed structured rows into a vector store — use SQL tools for tabular data lookups
Live knowledge must be fetched fresh at query time — it cannot be pre-indexed
Most production knowledge bases combine all three types — define each as a separate tool with clear descriptions
Small, stable knowledge: paste directly into the system prompt — simpler than RAG