Knowledge Base Use Cases
The same core RAG architecture — ingest, embed, retrieve, augment — applies to many use cases. Each use case has different data characteristics, query patterns, and quality requirements that should shape your chunking, metadata, and retrieval design.
Customer Support Bot
Knowledge base content: Product documentation, FAQs, troubleshooting guides, release notes, support article history
Query patterns: Specific feature questions ("how do I export to CSV?"), error messages ("what does error 404 mean in context X?"), comparison questions ("what's the difference between plan A and plan B?")
Key design decisions:
- Chunk size: 256–512 tokens — support questions are specific; smaller chunks improve precision
- Metadata: Product area, document type (FAQ, guide, release note), version/date
- Filtering: Filter by product area based on user context; filter out outdated documentation by date
- Prompt: "Answer based only on the provided documentation. If the answer is not there, say so and suggest contacting support."
- Citation: Include source document in the answer — helps users find the original documentation and builds trust
Internal Wiki / Company Knowledge
Knowledge base content: HR policies, company procedures, IT runbooks, org charts, onboarding docs, project wikis
Query patterns: Policy lookup ("how many days of annual leave do I get?"), procedure lookup ("how do I submit an expense?"), finding the right person ("who owns X?")
Key design decisions:
- Chunk size: 512–1024 tokens — policy questions often need surrounding context to be correctly interpreted
- Metadata: Department, document type (policy, runbook, guide), effective date, owner
- Filtering: Filter by department for department-specific policies; filter by effective date to surface current policies only
- Freshness: Re-ingest documents on update — stale policies are worse than no answer; include effective date in the answer
- Access control: For sensitive documents, implement row-level metadata filtering by user role or department
Codebase Q&A
Knowledge base content: Source code files, inline comments, README files, API documentation, architecture docs, commit messages
Query patterns: Function lookup ("how does the payment processing function work?"), API questions ("what parameters does this endpoint accept?"), architecture questions ("how is authentication handled?")
Key design decisions:
- Chunking: Function-level for code files — one chunk per function/class is usually better than fixed-size for code. Split at function boundaries using AST parsing if possible.
- Metadata: File path, programming language, module name, function name
- Embedding model: Consider a code-specific embedding model for better code similarity. General text embeddings work but code-specific models improve retrieval for purely code queries.
- Index both code and docs: Index source files and documentation together so queries can match either
- Re-index on changes: Set up a CI/CD hook to re-ingest changed files after each commit
Research Assistant
Knowledge base content: Academic papers, industry reports, market research, competitive intelligence, news articles, book chapters
Query patterns: Conceptual questions ("what are the main approaches to X?"), evidence lookup ("what does the research say about Y?"), synthesis ("compare findings from different sources")
Key design decisions:
- Chunk size: 512–1024 tokens — research questions often need more context than support questions; a finding without its methodology may be misleading
- Metadata: Authors, publication date, source type (paper, report, news), domain/topic tags, publication venue
- Date filtering: Research questions often specify recency — "recent studies" should filter to the last 1–3 years
- Multi-hop: Complex research questions often require retrieving from multiple documents. Consider query decomposition: break the question into sub-questions and retrieve separately.
- Citation: Always include source and date in retrieval results — research quality is inseparable from source credibility
Common Design Principles Across Use Cases
- Source attribution in every answer: Regardless of use case, always tell the user where the answer came from — builds trust and enables verification
- Explicit unknowns: Instruct Claude to say "I don't have information on that" rather than guessing — a correct "don't know" is better than a confident hallucination
- Freshness policy: Define a policy for how often documents are re-ingested and how outdated documents are surfaced or excluded
- Evaluate before launch: Build an evaluation dataset of 50+ representative questions and measure retrieval quality before going live
Checklist: Do You Understand This?
- Customer support: 256–512 token chunks, product area + date metadata, citation in answers
- Internal wiki: 512–1024 token chunks, department + effective date filtering, freshness critical
- Codebase Q&A: function-level chunks, file path + module metadata, code-specific embedding model
- Research: 512–1024 token chunks, date + source type metadata, multi-hop for complex queries
- Universal: source attribution in every answer, explicit unknowns, freshness policy, evaluation dataset