🧠 All Things AI
Intermediate

Document Ingestion

Loaders, parsers, and format handling — processing PDF, HTML, DOCX, and CSV into indexable content.

What You Will Learn

  • Document loaders: LangChain, LlamaIndex, custom loaders
  • PDF extraction: text layers, OCR fallback, table extraction
  • HTML cleaning: removing nav, ads, scripts before indexing
  • DOCX and Office format handling
  • CSV and structured data: when to embed vs use a SQL tool

This page is under development. Content is being added progressively. Check back soon for the full article.