Browser Automation with Claude
Browser automation is the most common computer use application. Claude can navigate to URLs, click through multi-page flows, fill forms, and extract structured data from web pages — including pages that require JavaScript rendering and user interaction to display content.
Setting Up a Browser for Claude
Claude needs a real display to interact with a browser. The standard setup for computer use browser automation:
- Virtual display: Run Xvfb (Linux) or use a virtual display on macOS/Windows to create an offscreen display Claude can control
- Browser: Chromium or Firefox launched with the virtual display as its screen
- Action execution: Use
pyautogui(Python) or xdotool (Linux) to send mouse/keyboard events to the display - Screenshot: Capture the display with
Pillow+ImageGrabor the system screenshot tool
Alternatively, run inside a Docker container with Xvfb and a browser pre-installed — this is the most portable and sandboxed approach for production deployments.
Navigating to URLs and Clicking Links
For navigation tasks, give Claude the starting URL in the task prompt. Claude will typically:
- Take an initial screenshot to see the current browser state
- Click the address bar (or use a bash command to launch the browser with a URL)
- Type the URL and press Enter
- Screenshot again to verify the page loaded
- Identify and click the target elements
Tips for reliable navigation:
- Tell Claude the expected page title or landmark to confirm successful navigation
- Use the
bashtool to open URLs directly:xdg-open https://example.com— faster than clicking the address bar - Maximise the browser window before starting so UI elements are in predictable positions
Filling Forms
Form filling is one of Claude's most reliable computer use capabilities. Claude can:
- Identify text input fields and click to focus them before typing
- Select options from dropdown menus by clicking the dropdown and then the option
- Check and uncheck checkboxes by clicking them
- Select radio buttons
- Upload files by clicking file input buttons and typing the file path
- Submit forms by clicking the submit button or pressing Enter
For forms with many fields, provide Claude with all the data upfront in the task prompt rather than answering field-by-field. Example: "Fill in the registration form with: Name: John Smith, Email: john@example.com, Phone: +1-555-1234..."
Scraping: Extracting Structured Data
Claude can extract structured data from web pages by reading what it sees in screenshots. Ask Claude to output the data in a specific format:
# In your task prompt:
"Navigate to the product listing page and extract all product names,
prices, and ratings into a JSON array. Screenshot the page, read
the content, and output the JSON."
# Claude will:
# 1. Take screenshot
# 2. Read product names/prices/ratings from the screenshot
# 3. Output JSON in its final text responseFor large pages with paginated content, Claude can scroll and paginate through multiple pages, extracting data from each. Give it an explicit instruction to scroll and continue extracting until it has seen all pages or reached a record limit.
Dynamic Pages and JavaScript Content
Unlike traditional scrapers, Claude works with the rendered page — including JavaScript-rendered content. This means:
- Single-page applications (React, Vue, Angular) work — Claude sees the final rendered DOM
- Content behind scroll events, infinite scroll, or lazy loading is accessible by scrolling first
- Modals, tooltips, and expandable sections can be triggered by clicking
Wait for page load: after navigation actions, take a screenshot immediately and check if the page is still loading. If it is, use the bash tool to sleep briefly (sleep 2) before taking the next screenshot.
CAPTCHAs and Authentication
- CAPTCHAs: Claude will not solve CAPTCHAs. Design your workflow to pause at CAPTCHAs for human completion, or pre-authenticate and pass cookies/tokens directly so Claude starts in an authenticated session.
- Login flows: Pass credentials explicitly in the task prompt when authentication is required. Use dedicated test accounts — do not have Claude interact with production accounts containing sensitive data.
- Session cookies: Start the browser with pre-loaded cookies for authenticated sessions to skip login entirely.
Checklist: Do You Understand This?
- Setup: virtual display (Xvfb) + browser + pyautogui/xdotool for actions + screenshot capture
- Navigation: use bash to open URLs directly; maximise browser window for predictable UI positions
- Forms: provide all field values upfront; Claude can fill text fields, dropdowns, checkboxes, and file uploads
- Scraping: Claude reads rendered page content from screenshots — works with JavaScript-rendered apps
- CAPTCHAs: pause for human completion, or pass pre-authenticated cookies to skip login