Intermediate

Desktop & GUI Automation

Desktop automation with computer use lets Claude operate native applications, Electron apps, and any software with a GUI — not just web browsers. This unlocks automation for legacy tools, proprietary software, and workflows that span multiple applications.

Types of Desktop Applications

Native GUI apps

  • Microsoft Office, Adobe apps
  • Accounting and ERP software
  • Legacy internal tools
  • Database admin tools (pgAdmin, SSMS)

Electron apps

  • VS Code, Slack, Discord
  • GitHub Desktop, Figma
  • Rendered as browser — use browser automation techniques

Command-line tools

  • Terminal interactions
  • Better handled via bash tool directly
  • GUI terminal useful for interactive TUI apps

Claude can navigate application menu bars by clicking through the menu hierarchy:

  • Click the top-level menu name (File, Edit, View) to open the dropdown
  • Screenshot to see the menu items
  • Click the target menu item
  • If the item has a submenu (indicated by a right arrow), hover to expand and screenshot again

Keyboard shortcuts are faster and more reliable than menu navigation. If you know the shortcut, instruct Claude to use it: "Press Ctrl+S to save" is more reliable than navigating File → Save.

File Dialog Interactions

File open/save dialogs are common in desktop automation. Claude's approach:

  1. Open the dialog via the application menu or keyboard shortcut
  2. Screenshot to see the dialog's current state and available controls
  3. Type the file path directly in the filename/path field — faster than navigating the directory tree
  4. Press Enter or click Open/Save

Teach Claude the file path in the task prompt: "Save the file to /home/user/documents/report.xlsx using Ctrl+Shift+S."

Copy, Paste, and Clipboard Operations

Clipboard operations are useful for transferring data between applications:

  • Select all text in a field: Ctrl+A
  • Copy selection: Ctrl+C
  • Paste: Ctrl+V
  • For large text transfers, use the bash tool to write content to a temp file and paste from there, rather than typing character-by-character

The type action in computer use types text directly into the focused field — it does not use the clipboard. For pasting pre-prepared content, use bash with xdotool type or xclip for more control.

When Desktop Automation Is Better Than APIs

Desktop automation with Claude is the right choice when:

  • The software has no API and cannot be scripted via command line
  • The automation target is a legacy or proprietary tool your organisation is not willing to replace
  • The workflow spans multiple GUI applications in a sequence that would be complex to replicate programmatically
  • One-off or low-frequency automation tasks that don't justify building a proper integration
  • Testing the GUI behaviour of your own application (QA automation)

Desktop automation is not the right choice when a stable API exists — APIs are faster, cheaper, more reliable, and not affected by UI changes. Use computer use as a last resort for no-API situations, not as a replacement for proper integrations.

Window Management

Desktop automation often requires managing multiple windows:

  • Use bash with wmctrl (Linux) or keyboard shortcuts to bring the right window to the foreground before interacting
  • Maximise target windows at the start of a task so UI positions are predictable
  • If working with multiple apps, take a screenshot after switching windows to confirm the right app is in focus

Checklist: Do You Understand This?

  • Claude can control native apps, Electron apps, and CLI tools via the GUI using the same screenshot + action loop
  • Keyboard shortcuts are more reliable than menu navigation — instruct Claude to use them when known
  • File dialogs: type the full path directly rather than navigating the directory tree
  • Clipboard: Ctrl+A/C/V keyboard actions; for large content use bash + xclip
  • Use desktop automation only when no API exists — it is slower and less reliable than programmatic integration

Page built: 01 Jun 2026