Skip to content

Browser automation

Set up headless browser automation with agent-browser and GUI debugging via VNC.

Revka can drive a real web browser so your agent can open pages, read rendered content, click elements, fill forms, and take screenshots. This page covers enabling the browser tool, choosing a backend, locking navigation to an allowlist, debugging with a visible browser over VNC or Chrome Remote Desktop, and delegating tougher web apps to an external CLI with browser_delegate.

Use browser automation when a page renders content with JavaScript, requires login, or needs interaction. For simple static pages, the lighter web tools (web_fetch, text_browser) are faster and cheaper.

Revka exposes several web-related tools. The two that matter for automation are:

  • browser_open — opens an approved HTTPS URL in the system browser. No scraping, no interaction.
  • browser — full automation with pluggable backends. Supports DOM actions (open, snapshot, click, fill, type, get_text, get_title, get_url, screenshot, wait, press, hover, scroll, is_visible, close, find) plus optional OS-level actions (mouse_move, mouse_click, mouse_drag, key_type, key_press, screen_capture) when the computer_use backend is active.

Both are gated by the [browser] config section and share the allowed_domains allowlist.

Configure browser access in ~/.revka/config.toml:

[browser]
enabled = true
allowed_domains = ["*"]
backend = "agent_browser"
native_headless = true
KeyTypeDefaultMeaning
enabledboolfalseMaster switch for browser and browser_open
allowed_domainsarray[]Navigation allowlist; "*" allows all public domains
backendstring"agent_browser"agent_browser, rust_native, computer_use, or auto
session_namestringunsetNamed browser session for agent-browser (persists auth state)
native_headlessbooltrueHeadless mode for the rust-native backend
native_webdriver_urlstringhttp://127.0.0.1:9515WebDriver endpoint for the rust-native backend
native_chrome_pathstringunsetOptional Chrome/Chromium executable path for the rust-native backend

allowed_domains is the security boundary for all browser navigation. It is deny-by-default:

  • An empty list ([]) blocks every URL.
  • "*" allows all public domains, but local and private hosts are still blocked.
  • Specific entries use exact or subdomain matching.
[browser]
enabled = true
# Lock the agent to two domains only:
allowed_domains = ["example.com", "docs.example.com"]

To disable browser access entirely:

[browser]
enabled = false

Set backend to match your environment.

The default backend drives Chrome for Testing through the agent-browser CLI. It runs headless with sandboxing and is the easiest path on a server or container.

Install it once:

Terminal window
# Install the CLI
npm install -g agent-browser
# Download Chrome for Testing
agent-browser install --with-deps # Linux (includes system deps)
agent-browser install # macOS / Windows

Verify your config has backend = "agent_browser" and enabled = true, then test through the agent:

Terminal window
echo "Open https://example.com and tell me what it says" | revka agent

You can exercise the CLI directly to confirm the install:

Terminal window
agent-browser open https://example.com
agent-browser get title
agent-browser snapshot -i
agent-browser screenshot /tmp/test.png
agent-browser close

A WebDriver-based backend built into Revka. Use it when you want to talk to an existing WebDriver/ChromeDriver instance instead of the agent-browser CLI.

[browser]
enabled = true
backend = "rust_native"
native_headless = true
native_webdriver_url = "http://127.0.0.1:9515"
native_chrome_path = "/usr/bin/chromium" # optional

Delegates browser actions to a separate computer-use sidecar over HTTP, which performs OS-level mouse, keyboard, and screenshot actions. Choose this when you need true OS-level control rather than DOM scripting — it is what unlocks the mouse_move, mouse_click, mouse_drag, key_type, key_press, and screen_capture actions.

[browser]
enabled = true
backend = "computer_use"
[browser.computer_use]
endpoint = "http://127.0.0.1:8787/v1/actions"
timeout_ms = 15000
allow_remote_endpoint = false
window_allowlist = []
# api_key = "..." # optional bearer token, stored encrypted
# max_coordinate_x = 1920 # optional coordinate boundary
# max_coordinate_y = 1080
KeyDefaultMeaning
endpointhttp://127.0.0.1:8787/v1/actionsSidecar endpoint for OS-level actions
api_keyunsetOptional bearer token (stored encrypted)
timeout_ms15000Per-action request timeout
allow_remote_endpointfalseAllow a non-loopback endpoint
window_allowlist[]Window title/process allowlist forwarded to sidecar policy
max_coordinate_x / max_coordinate_yunsetOptional coordinate boundaries

backend = "auto" lets Revka select an available backend automatically.

After an open, call snapshot to map interactive elements to stable refs like @e1, @e2, @e3. Pass those refs to click, fill, hover, and similar actions. A typical agent flow:

{"action": "open", "url": "https://example.com"}
{"action": "snapshot"}
{"action": "click", "selector": "@e3"}

Selectors also accept CSS (button.submit) and text matchers (text=Accept). Re-run snapshot after the page changes — refs are recomputed each time.

For visual debugging, run a virtual display and view the browser through a VNC client or noVNC in your web browser.

Install the dependencies (Ubuntu/Debian):

Terminal window
apt-get install -y xvfb x11vnc fluxbox novnc websockify
# Optional desktop environment (needed for Chrome Remote Desktop below)
apt-get install -y xfce4 xfce4-goodies

Start the virtual display, window manager, VNC server, and noVNC bridge:

#!/bin/bash
DISPLAY_NUM=99
VNC_PORT=5900
NOVNC_PORT=6080
RESOLUTION=1920x1080x24
Xvfb :$DISPLAY_NUM -screen 0 $RESOLUTION -ac &
sleep 1
fluxbox -display :$DISPLAY_NUM &
sleep 1
x11vnc -display :$DISPLAY_NUM -rfbport $VNC_PORT -forever -shared -nopw -bg
sleep 1
websockify --web=/usr/share/novnc $NOVNC_PORT localhost:$VNC_PORT &

Then connect:

  • VNC client: localhost:5900
  • Web browser: http://localhost:6080/vnc.html

Launch a browser on the virtual display to watch automation live:

Terminal window
DISPLAY=:99 google-chrome --no-sandbox https://example.com &

For a managed remote GUI through a Google account, install Chrome Remote Desktop on the server:

Terminal window
wget https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.deb
apt-get install -y ./chrome-remote-desktop_current_amd64.deb
# Configure the session
echo "xfce4-session" > ~/.chrome-remote-desktop-session
chmod +x ~/.chrome-remote-desktop-session

Then:

  1. Visit https://remotedesktop.google.com/headless.
  2. Copy the “Debian Linux” setup command and run it on your server.
  3. Start the service: systemctl --user start chrome-remote-desktop.
  4. Connect from any device at https://remotedesktop.google.com/access.

browser_delegate hands a natural-language browser task to an external browser-capable CLI subprocess (for example, Claude Code with the claude-in-chrome MCP). It is useful for corporate web apps — Teams, Outlook, Jira, Confluence — that have no direct API and are awkward for the headless backends.

[browser_delegate]
enabled = true
cli_binary = "claude"
chrome_profile_dir = "~/.config/chrome-corp-profile"
allowed_domains = ["teams.microsoft.com", "outlook.office.com"]
blocked_domains = []
task_timeout_secs = 120

The tool takes a single task argument:

{"task": "Check my unread Teams messages"}
  • “Element not found”: the page may not be fully loaded. Add a wait before snapshotting — agent-browser wait --load networkidle, then agent-browser snapshot -i.
  • Cookie banners blocking content: snapshot, click the consent ref (e.g. @accept_cookies), then snapshot again for the real content.
  • web_fetch blocked inside a Docker sandbox: use the browser backend instead — agent-browser open <url> then agent-browser get text body.
  • Persisting login state: set [browser].session_name (or pass --session-name to the CLI) so auth cookies survive between runs.