Desktop & GUI Automation
Desktop automation with computer use lets Claude operate native applications, Electron apps, and any software with a GUI — not just web browsers. This unlocks automation for legacy tools, proprietary software, and workflows that span multiple applications.
Types of Desktop Applications
Native GUI apps
- Microsoft Office, Adobe apps
- Accounting and ERP software
- Legacy internal tools
- Database admin tools (pgAdmin, SSMS)
Electron apps
- VS Code, Slack, Discord
- GitHub Desktop, Figma
- Rendered as browser — use browser automation techniques
Command-line tools
- Terminal interactions
- Better handled via bash tool directly
- GUI terminal useful for interactive TUI apps
Navigating Application Menus
Claude can navigate application menu bars by clicking through the menu hierarchy:
- Click the top-level menu name (File, Edit, View) to open the dropdown
- Screenshot to see the menu items
- Click the target menu item
- If the item has a submenu (indicated by a right arrow), hover to expand and screenshot again
Keyboard shortcuts are faster and more reliable than menu navigation. If you know the shortcut, instruct Claude to use it: "Press Ctrl+S to save" is more reliable than navigating File → Save.
File Dialog Interactions
File open/save dialogs are common in desktop automation. Claude's approach:
- Open the dialog via the application menu or keyboard shortcut
- Screenshot to see the dialog's current state and available controls
- Type the file path directly in the filename/path field — faster than navigating the directory tree
- Press Enter or click Open/Save
Teach Claude the file path in the task prompt: "Save the file to /home/user/documents/report.xlsx using Ctrl+Shift+S."
Copy, Paste, and Clipboard Operations
Clipboard operations are useful for transferring data between applications:
- Select all text in a field:
Ctrl+A - Copy selection:
Ctrl+C - Paste:
Ctrl+V - For large text transfers, use the
bashtool to write content to a temp file and paste from there, rather than typing character-by-character
The type action in computer use types text directly into the focused field — it does not use the clipboard. For pasting pre-prepared content, use bash with xdotool type or xclip for more control.
When Desktop Automation Is Better Than APIs
Desktop automation with Claude is the right choice when:
- The software has no API and cannot be scripted via command line
- The automation target is a legacy or proprietary tool your organisation is not willing to replace
- The workflow spans multiple GUI applications in a sequence that would be complex to replicate programmatically
- One-off or low-frequency automation tasks that don't justify building a proper integration
- Testing the GUI behaviour of your own application (QA automation)
Desktop automation is not the right choice when a stable API exists — APIs are faster, cheaper, more reliable, and not affected by UI changes. Use computer use as a last resort for no-API situations, not as a replacement for proper integrations.
Window Management
Desktop automation often requires managing multiple windows:
- Use
bashwithwmctrl(Linux) or keyboard shortcuts to bring the right window to the foreground before interacting - Maximise target windows at the start of a task so UI positions are predictable
- If working with multiple apps, take a screenshot after switching windows to confirm the right app is in focus
Checklist: Do You Understand This?
- Claude can control native apps, Electron apps, and CLI tools via the GUI using the same screenshot + action loop
- Keyboard shortcuts are more reliable than menu navigation — instruct Claude to use them when known
- File dialogs: type the full path directly rather than navigating the directory tree
- Clipboard:
Ctrl+A/C/Vkeyboard actions; for large content use bash + xclip - Use desktop automation only when no API exists — it is slower and less reliable than programmatic integration