Intermediate

Multimodal Input with Claude

Claude can process images, PDFs, and other documents alongside text. This unlocks a wide range of tasks that were previously impractical — reading screenshots, extracting data from PDFs, analysing charts, and reasoning about visual content.

Image Understanding

Claude can analyse, describe, and reason about images. Supported capabilities:

  • Scene description: Describe what is in an image — objects, people, actions, setting. Useful for accessibility, content moderation, and cataloguing.
  • Text in images (OCR): Claude reads text visible in photos — handwriting (reasonably well), typed text in screenshots, signs, and printed documents.
  • Chart and graph interpretation: Claude can read bar charts, line graphs, pie charts, and scatter plots — describing trends, reading values, and drawing conclusions. It works well for clearly labelled charts; complex or unlabelled charts produce less reliable output.
  • Diagram analysis: Architecture diagrams, flowcharts, and technical schematics — Claude can describe what the diagram shows and reason about relationships.
  • UI and screenshot analysis: Describe what a screenshot shows, identify UI elements, explain layouts. Useful for bug reports, documentation, and design review.

Claude is not a pixel-level image analysis tool — it perceives visual patterns and text, not individual pixel values. For precise colour analysis, object detection at scale, or measurement extraction, use dedicated computer vision tools.

PDF and Document Analysis

Uploading a PDF directly is the most reliable way to work with document content:

  • Claude reads the document text and can answer questions about it, summarise sections, and extract specific information
  • Tables in PDFs are generally extracted correctly — complex multi-column layouts sometimes lose formatting
  • For research papers: "What is the main claim? What data supports it? What are the limitations?"
  • For contracts: "Extract all payment terms, deadlines, and termination clauses"
  • For reports: "Summarise the executive summary, then give me the key metrics from the data section"

PDFs with primarily image-based content (scanned documents, image-heavy slides) may not extract text reliably — Claude will work with what it can see visually.

OCR and Text Extraction from Images

For extracting text from images:

  • Screenshots: Claude reliably reads text in interface screenshots. Useful for extracting log output, error messages, or code from a screen capture.
  • Handwriting: Claude can read reasonably clear handwriting but struggles with highly cursive or messy handwriting. Always verify extracted handwritten text.
  • Printed text in photos: Claude reads printed text in photographs reliably if the image is clear, well-lit, and the text is not too small.
  • Mixed image/text: For documents that mix images with text (marketing materials, slides), Claude reads both but processes them differently.

Charts and Diagrams

Prompting strategies for charts and diagrams:

  • "Describe this chart — what is it showing and what is the main trend?"
  • "What is the approximate value for [category] in this bar chart?"
  • "Does this data show a statistically significant trend, based on what you can see?"
  • "Explain this architecture diagram — what are the components and how do they connect?"

Be aware that Claude estimates values visually — it cannot read exact pixel values. For precise data extraction from charts, convert to CSV/tabular format using a tool like Tabula (for PDFs) before asking Claude to analyse it.

File Size Limits and Supported Formats

TypeSupported FormatsNotes
ImagesJPEG, PNG, GIF, WebPMax ~5MB per image; multiple images per message supported
DocumentsPDF, DOCX, TXT, MDPDF text extraction; DOCX and text files read directly
DataCSV, JSONClaude can analyse and summarise structured data files
CodeMost language file typesTreated as plain text — Claude reads and reasons about code

Limits vary by plan. Enterprise and API usage may have different limits. Check Anthropic documentation for current specifications.

Checklist: Do You Understand This?

  • Claude reads images: scene description, text (OCR), charts, diagrams, and screenshots
  • Upload PDFs directly for document Q&A, summarisation, and extraction — more reliable than pasting text
  • Chart reading estimates visually — for precise values, convert data to tabular format first
  • Handwriting recognition works for clear writing; verify any extracted handwritten text
  • Claude is not a pixel-level image analysis tool — use dedicated computer vision for measurements and precise detection

Page built: 01 Jun 2026