Intermediate

Browser Automation with Claude

Browser automation is the most common computer use application. Claude can navigate to URLs, click through multi-page flows, fill forms, and extract structured data from web pages — including pages that require JavaScript rendering and user interaction to display content.

Setting Up a Browser for Claude

Claude needs a real display to interact with a browser. The standard setup for computer use browser automation:

  • Virtual display: Run Xvfb (Linux) or use a virtual display on macOS/Windows to create an offscreen display Claude can control
  • Browser: Chromium or Firefox launched with the virtual display as its screen
  • Action execution: Use pyautogui (Python) or xdotool (Linux) to send mouse/keyboard events to the display
  • Screenshot: Capture the display with Pillow + ImageGrab or the system screenshot tool

Alternatively, run inside a Docker container with Xvfb and a browser pre-installed — this is the most portable and sandboxed approach for production deployments.

For navigation tasks, give Claude the starting URL in the task prompt. Claude will typically:

  1. Take an initial screenshot to see the current browser state
  2. Click the address bar (or use a bash command to launch the browser with a URL)
  3. Type the URL and press Enter
  4. Screenshot again to verify the page loaded
  5. Identify and click the target elements

Tips for reliable navigation:

  • Tell Claude the expected page title or landmark to confirm successful navigation
  • Use the bash tool to open URLs directly: xdg-open https://example.com — faster than clicking the address bar
  • Maximise the browser window before starting so UI elements are in predictable positions

Filling Forms

Form filling is one of Claude's most reliable computer use capabilities. Claude can:

  • Identify text input fields and click to focus them before typing
  • Select options from dropdown menus by clicking the dropdown and then the option
  • Check and uncheck checkboxes by clicking them
  • Select radio buttons
  • Upload files by clicking file input buttons and typing the file path
  • Submit forms by clicking the submit button or pressing Enter

For forms with many fields, provide Claude with all the data upfront in the task prompt rather than answering field-by-field. Example: "Fill in the registration form with: Name: John Smith, Email: john@example.com, Phone: +1-555-1234..."

Scraping: Extracting Structured Data

Claude can extract structured data from web pages by reading what it sees in screenshots. Ask Claude to output the data in a specific format:

# In your task prompt:
"Navigate to the product listing page and extract all product names,
prices, and ratings into a JSON array. Screenshot the page, read
the content, and output the JSON."

# Claude will:
# 1. Take screenshot
# 2. Read product names/prices/ratings from the screenshot
# 3. Output JSON in its final text response

For large pages with paginated content, Claude can scroll and paginate through multiple pages, extracting data from each. Give it an explicit instruction to scroll and continue extracting until it has seen all pages or reached a record limit.

Dynamic Pages and JavaScript Content

Unlike traditional scrapers, Claude works with the rendered page — including JavaScript-rendered content. This means:

  • Single-page applications (React, Vue, Angular) work — Claude sees the final rendered DOM
  • Content behind scroll events, infinite scroll, or lazy loading is accessible by scrolling first
  • Modals, tooltips, and expandable sections can be triggered by clicking

Wait for page load: after navigation actions, take a screenshot immediately and check if the page is still loading. If it is, use the bash tool to sleep briefly (sleep 2) before taking the next screenshot.

CAPTCHAs and Authentication

  • CAPTCHAs: Claude will not solve CAPTCHAs. Design your workflow to pause at CAPTCHAs for human completion, or pre-authenticate and pass cookies/tokens directly so Claude starts in an authenticated session.
  • Login flows: Pass credentials explicitly in the task prompt when authentication is required. Use dedicated test accounts — do not have Claude interact with production accounts containing sensitive data.
  • Session cookies: Start the browser with pre-loaded cookies for authenticated sessions to skip login entirely.

Checklist: Do You Understand This?

  • Setup: virtual display (Xvfb) + browser + pyautogui/xdotool for actions + screenshot capture
  • Navigation: use bash to open URLs directly; maximise browser window for predictable UI positions
  • Forms: provide all field values upfront; Claude can fill text fields, dropdowns, checkboxes, and file uploads
  • Scraping: Claude reads rendered page content from screenshots — works with JavaScript-rendered apps
  • CAPTCHAs: pause for human completion, or pass pre-authenticated cookies to skip login

Page built: 01 Jun 2026