Beginner

What is Claude Computer Use?

Computer use is Claude's ability to control a computer as a human would — seeing the screen via screenshots, and acting via mouse clicks, keyboard input, and scrolling. It enables Claude to automate tasks in any application with a graphical interface, not just APIs.

The Computer Use Model

Computer use works on a simple loop: Claude receives a screenshot of the current screen state, decides what action to take, and outputs a structured action command. Your code executes the action, takes a new screenshot, and sends it back to Claude. This repeats until the task is complete.

This is fundamentally different from traditional automation scripts. Scripts have pre-programmed logic for specific UI states. Claude reasons about what it sees — it can adapt when a menu looks different, a window is in the wrong position, or an unexpected dialog appears.

Supported Actions

Claude computer use supports the following action types:

Mouse click — left, right, or double-click at specific (x, y) screen coordinates
Mouse move — hover over a position without clicking
Keyboard type — type a string of text into the focused element
Key press — press specific keys or combinations (Enter, Tab, Ctrl+C, Escape, etc.)
Scroll — scroll up or down at a position
Screenshot — capture the current screen state

Combined with the bash tool (run shell commands) and text_editor tool (edit files directly), Claude can automate workflows that mix GUI interaction with command-line operations.

What Claude Can Navigate

Web browsers: Chromium-based browsers (Chrome, Edge), Firefox — navigating pages, clicking links, filling forms, extracting content
Desktop GUI applications: File managers, office applications, developer tools — any application visible on screen
Terminal applications: Command-line interfaces where typing commands and reading output is the interaction model
Custom internal tools: Legacy software without APIs that can only be operated through the GUI

Current Limitations

Computer use is in beta (as of 2025) and has real limitations that matter for production use:

Speed: Each action requires an API call with a screenshot — even simple tasks take seconds per action. A 10-step form might take 30–60 seconds.
Accuracy: Claude may misidentify UI elements, especially small targets, similar-looking buttons, or non-standard interfaces
Scrolling and dynamic content: Pages that load content on scroll, or UIs that animate, can confuse the action loop
CAPTCHAs: Claude will not attempt to solve CAPTCHAs — pause for human completion
Credential entry: Be explicit about what credentials Claude is allowed to use; it will not independently authenticate to services without instruction
Non-Latin text: OCR quality for non-Latin scripts in screenshots is lower

API Access

Computer use is available via the Anthropic API using the claude-sonnet-4-6 model with the computer use beta header. It is not available in the Claude.ai web interface or Claude Desktop App — it is an API-only feature designed for developers building automation systems.

To use computer use, you need to implement the action execution layer (a script that takes the action commands Claude outputs and runs them on a real or virtual desktop environment). Most implementations use Playwright, Selenium, or pyautogui for the execution layer.

Checklist: Do You Understand This?

Computer use = screenshot in + action out; Claude sees screen and controls mouse/keyboard
Actions: click, type, key press, scroll, screenshot — plus bash and text_editor tools
Claude reasons about what it sees — adapts to unexpected UI states, unlike fixed scripts
Limitations: slow (1 API call per action), accuracy issues on small targets, no CAPTCHA solving
API-only feature — requires implementing your own action execution layer (Playwright, pyautogui)