Beginner

What is Claude Computer Use?

Computer use is Claude's ability to control a computer as a human would — seeing the screen via screenshots, and acting via mouse clicks, keyboard input, and scrolling. It enables Claude to automate tasks in any application with a graphical interface, not just APIs.

The Computer Use Model

Computer use works on a simple loop: Claude receives a screenshot of the current screen state, decides what action to take, and outputs a structured action command. Your code executes the action, takes a new screenshot, and sends it back to Claude. This repeats until the task is complete.

This is fundamentally different from traditional automation scripts. Scripts have pre-programmed logic for specific UI states. Claude reasons about what it sees — it can adapt when a menu looks different, a window is in the wrong position, or an unexpected dialog appears.

Supported Actions

Claude computer use supports the following action types:

  • Mouse click — left, right, or double-click at specific (x, y) screen coordinates
  • Mouse move — hover over a position without clicking
  • Keyboard type — type a string of text into the focused element
  • Key press — press specific keys or combinations (Enter, Tab, Ctrl+C, Escape, etc.)
  • Scroll — scroll up or down at a position
  • Screenshot — capture the current screen state

Combined with the bash tool (run shell commands) and text_editor tool (edit files directly), Claude can automate workflows that mix GUI interaction with command-line operations.

What Claude Can Navigate

  • Web browsers: Chromium-based browsers (Chrome, Edge), Firefox — navigating pages, clicking links, filling forms, extracting content
  • Desktop GUI applications: File managers, office applications, developer tools — any application visible on screen
  • Terminal applications: Command-line interfaces where typing commands and reading output is the interaction model
  • Custom internal tools: Legacy software without APIs that can only be operated through the GUI

Current Limitations

Computer use is in beta (as of 2025) and has real limitations that matter for production use:

  • Speed: Each action requires an API call with a screenshot — even simple tasks take seconds per action. A 10-step form might take 30–60 seconds.
  • Accuracy: Claude may misidentify UI elements, especially small targets, similar-looking buttons, or non-standard interfaces
  • Scrolling and dynamic content: Pages that load content on scroll, or UIs that animate, can confuse the action loop
  • CAPTCHAs: Claude will not attempt to solve CAPTCHAs — pause for human completion
  • Credential entry: Be explicit about what credentials Claude is allowed to use; it will not independently authenticate to services without instruction
  • Non-Latin text: OCR quality for non-Latin scripts in screenshots is lower

API Access

Computer use is available via the Anthropic API using the claude-sonnet-4-6 model with the computer use beta header. It is not available in the Claude.ai web interface or Claude Desktop App — it is an API-only feature designed for developers building automation systems.

To use computer use, you need to implement the action execution layer (a script that takes the action commands Claude outputs and runs them on a real or virtual desktop environment). Most implementations use Playwright, Selenium, or pyautogui for the execution layer.

Checklist: Do You Understand This?

  • Computer use = screenshot in + action out; Claude sees screen and controls mouse/keyboard
  • Actions: click, type, key press, scroll, screenshot — plus bash and text_editor tools
  • Claude reasons about what it sees — adapts to unexpected UI states, unlike fixed scripts
  • Limitations: slow (1 API call per action), accuracy issues on small targets, no CAPTCHA solving
  • API-only feature — requires implementing your own action execution layer (Playwright, pyautogui)

Page built: 01 Jun 2026