Intermediate

Multimodal Input with Claude

Claude can process images, PDFs, and other documents alongside text. This unlocks a wide range of tasks that were previously impractical — reading screenshots, extracting data from PDFs, analysing charts, and reasoning about visual content.

Image Understanding

Claude can analyse, describe, and reason about images. Supported capabilities:

Scene description: Describe what is in an image — objects, people, actions, setting. Useful for accessibility, content moderation, and cataloguing.
Text in images (OCR): Claude reads text visible in photos — handwriting (reasonably well), typed text in screenshots, signs, and printed documents.
Chart and graph interpretation: Claude can read bar charts, line graphs, pie charts, and scatter plots — describing trends, reading values, and drawing conclusions. It works well for clearly labelled charts; complex or unlabelled charts produce less reliable output.
Diagram analysis: Architecture diagrams, flowcharts, and technical schematics — Claude can describe what the diagram shows and reason about relationships.
UI and screenshot analysis: Describe what a screenshot shows, identify UI elements, explain layouts. Useful for bug reports, documentation, and design review.

Claude is not a pixel-level image analysis tool — it perceives visual patterns and text, not individual pixel values. For precise colour analysis, object detection at scale, or measurement extraction, use dedicated computer vision tools.

PDF and Document Analysis

Uploading a PDF directly is the most reliable way to work with document content:

Claude reads the document text and can answer questions about it, summarise sections, and extract specific information
Tables in PDFs are generally extracted correctly — complex multi-column layouts sometimes lose formatting
For research papers: "What is the main claim? What data supports it? What are the limitations?"
For contracts: "Extract all payment terms, deadlines, and termination clauses"
For reports: "Summarise the executive summary, then give me the key metrics from the data section"

PDFs with primarily image-based content (scanned documents, image-heavy slides) may not extract text reliably — Claude will work with what it can see visually.

OCR and Text Extraction from Images

For extracting text from images:

Screenshots: Claude reliably reads text in interface screenshots. Useful for extracting log output, error messages, or code from a screen capture.
Handwriting: Claude can read reasonably clear handwriting but struggles with highly cursive or messy handwriting. Always verify extracted handwritten text.
Printed text in photos: Claude reads printed text in photographs reliably if the image is clear, well-lit, and the text is not too small.
Mixed image/text: For documents that mix images with text (marketing materials, slides), Claude reads both but processes them differently.

Charts and Diagrams

Prompting strategies for charts and diagrams:

"Describe this chart — what is it showing and what is the main trend?"
"What is the approximate value for [category] in this bar chart?"
"Does this data show a statistically significant trend, based on what you can see?"
"Explain this architecture diagram — what are the components and how do they connect?"

Be aware that Claude estimates values visually — it cannot read exact pixel values. For precise data extraction from charts, convert to CSV/tabular format using a tool like Tabula (for PDFs) before asking Claude to analyse it.

File Size Limits and Supported Formats

Type	Supported Formats	Notes
Images	JPEG, PNG, GIF, WebP	Max ~5MB per image; multiple images per message supported
Documents	PDF, DOCX, TXT, MD	PDF text extraction; DOCX and text files read directly
Data	CSV, JSON	Claude can analyse and summarise structured data files
Code	Most language file types	Treated as plain text — Claude reads and reasons about code

Limits vary by plan. Enterprise and API usage may have different limits. Check Anthropic documentation for current specifications.

Checklist: Do You Understand This?

Claude reads images: scene description, text (OCR), charts, diagrams, and screenshots
Upload PDFs directly for document Q&A, summarisation, and extraction — more reliable than pasting text
Chart reading estimates visually — for precise values, convert data to tabular format first
Handwriting recognition works for clear writing; verify any extracted handwritten text
Claude is not a pixel-level image analysis tool — use dedicated computer vision for measurements and precise detection