Multimodal Input with Claude
Claude can process images, PDFs, and other documents alongside text. This unlocks a wide range of tasks that were previously impractical — reading screenshots, extracting data from PDFs, analysing charts, and reasoning about visual content.
Image Understanding
Claude can analyse, describe, and reason about images. Supported capabilities:
- Scene description: Describe what is in an image — objects, people, actions, setting. Useful for accessibility, content moderation, and cataloguing.
- Text in images (OCR): Claude reads text visible in photos — handwriting (reasonably well), typed text in screenshots, signs, and printed documents.
- Chart and graph interpretation: Claude can read bar charts, line graphs, pie charts, and scatter plots — describing trends, reading values, and drawing conclusions. It works well for clearly labelled charts; complex or unlabelled charts produce less reliable output.
- Diagram analysis: Architecture diagrams, flowcharts, and technical schematics — Claude can describe what the diagram shows and reason about relationships.
- UI and screenshot analysis: Describe what a screenshot shows, identify UI elements, explain layouts. Useful for bug reports, documentation, and design review.
Claude is not a pixel-level image analysis tool — it perceives visual patterns and text, not individual pixel values. For precise colour analysis, object detection at scale, or measurement extraction, use dedicated computer vision tools.
PDF and Document Analysis
Uploading a PDF directly is the most reliable way to work with document content:
- Claude reads the document text and can answer questions about it, summarise sections, and extract specific information
- Tables in PDFs are generally extracted correctly — complex multi-column layouts sometimes lose formatting
- For research papers: "What is the main claim? What data supports it? What are the limitations?"
- For contracts: "Extract all payment terms, deadlines, and termination clauses"
- For reports: "Summarise the executive summary, then give me the key metrics from the data section"
PDFs with primarily image-based content (scanned documents, image-heavy slides) may not extract text reliably — Claude will work with what it can see visually.
OCR and Text Extraction from Images
For extracting text from images:
- Screenshots: Claude reliably reads text in interface screenshots. Useful for extracting log output, error messages, or code from a screen capture.
- Handwriting: Claude can read reasonably clear handwriting but struggles with highly cursive or messy handwriting. Always verify extracted handwritten text.
- Printed text in photos: Claude reads printed text in photographs reliably if the image is clear, well-lit, and the text is not too small.
- Mixed image/text: For documents that mix images with text (marketing materials, slides), Claude reads both but processes them differently.
Charts and Diagrams
Prompting strategies for charts and diagrams:
- "Describe this chart — what is it showing and what is the main trend?"
- "What is the approximate value for [category] in this bar chart?"
- "Does this data show a statistically significant trend, based on what you can see?"
- "Explain this architecture diagram — what are the components and how do they connect?"
Be aware that Claude estimates values visually — it cannot read exact pixel values. For precise data extraction from charts, convert to CSV/tabular format using a tool like Tabula (for PDFs) before asking Claude to analyse it.
File Size Limits and Supported Formats
| Type | Supported Formats | Notes |
|---|---|---|
| Images | JPEG, PNG, GIF, WebP | Max ~5MB per image; multiple images per message supported |
| Documents | PDF, DOCX, TXT, MD | PDF text extraction; DOCX and text files read directly |
| Data | CSV, JSON | Claude can analyse and summarise structured data files |
| Code | Most language file types | Treated as plain text — Claude reads and reasons about code |
Limits vary by plan. Enterprise and API usage may have different limits. Check Anthropic documentation for current specifications.
Checklist: Do You Understand This?
- Claude reads images: scene description, text (OCR), charts, diagrams, and screenshots
- Upload PDFs directly for document Q&A, summarisation, and extraction — more reliable than pasting text
- Chart reading estimates visually — for precise values, convert data to tabular format first
- Handwriting recognition works for clear writing; verify any extracted handwritten text
- Claude is not a pixel-level image analysis tool — use dedicated computer vision for measurements and precise detection