Browser automation

Set up headless browser automation with agent-browser and GUI debugging via VNC.

Revka can drive a real web browser so your agent can open pages, read rendered content, click elements, fill forms, and take screenshots. This page covers enabling the browser tool, choosing a backend, locking navigation to an allowlist, debugging with a visible browser over VNC or Chrome Remote Desktop, and delegating tougher web apps to an external CLI with browser_delegate.

Use browser automation when a page renders content with JavaScript, requires login, or needs interaction. For simple static pages, the lighter web tools (web_fetch, text_browser) are faster and cheaper.

How browser access fits together

Revka exposes several web-related tools. The two that matter for automation are:

browser_open — opens an approved HTTPS URL in the system browser. No scraping, no interaction.
browser — full automation with pluggable backends. Supports DOM actions (open, snapshot, click, fill, type, get_text, get_title, get_url, screenshot, wait, press, hover, scroll, is_visible, close, find) plus optional OS-level actions (mouse_move, mouse_click, mouse_drag, key_type, key_press, screen_capture) when the computer_use backend is active.

Both are gated by the [browser] config section and share the allowed_domains allowlist.

The `[browser]` config section

Configure browser access in ~/.revka/config.toml:

[browser]
enabled = true
allowed_domains = ["*"]
backend = "agent_browser"
native_headless = true

Key	Type	Default	Meaning
`enabled`	bool	`false`	Master switch for `browser` and `browser_open`
`allowed_domains`	array	`[]`	Navigation allowlist; `"*"` allows all public domains
`backend`	string	`"agent_browser"`	`agent_browser`, `rust_native`, `computer_use`, or `auto`
`session_name`	string	unset	Named browser session for agent-browser (persists auth state)
`native_headless`	bool	`true`	Headless mode for the rust-native backend
`native_webdriver_url`	string	`http://127.0.0.1:9515`	WebDriver endpoint for the rust-native backend
`native_chrome_path`	string	unset	Optional Chrome/Chromium executable path for the rust-native backend

allowed_domains

allowed_domains is the security boundary for all browser navigation. It is deny-by-default:

An empty list ([]) blocks every URL.
"*" allows all public domains, but local and private hosts are still blocked.
Specific entries use exact or subdomain matching.

[browser]
enabled = true
# Lock the agent to two domains only:
allowed_domains = ["example.com", "docs.example.com"]

To disable browser access entirely:

[browser]
enabled = false

Choosing a backend

Set backend to match your environment.

agent_browser (default, recommended)

The default backend drives Chrome for Testing through the agent-browser CLI. It runs headless with sandboxing and is the easiest path on a server or container.

Install it once:

# Install the CLI
npm install -g agent-browser

# Download Chrome for Testing
agent-browser install --with-deps   # Linux (includes system deps)
agent-browser install               # macOS / Windows

Verify your config has backend = "agent_browser" and enabled = true, then test through the agent:

echo "Open https://example.com and tell me what it says" | revka agent

You can exercise the CLI directly to confirm the install:

agent-browser open https://example.com
agent-browser get title
agent-browser snapshot -i
agent-browser screenshot /tmp/test.png
agent-browser close

rust_native

A WebDriver-based backend built into Revka. Use it when you want to talk to an existing WebDriver/ChromeDriver instance instead of the agent-browser CLI.

[browser]
enabled = true
backend = "rust_native"
native_headless = true
native_webdriver_url = "http://127.0.0.1:9515"
native_chrome_path = "/usr/bin/chromium"   # optional

computer_use

Delegates browser actions to a separate computer-use sidecar over HTTP, which performs OS-level mouse, keyboard, and screenshot actions. Choose this when you need true OS-level control rather than DOM scripting — it is what unlocks the mouse_move, mouse_click, mouse_drag, key_type, key_press, and screen_capture actions.

[browser]
enabled = true
backend = "computer_use"

[browser.computer_use]
endpoint = "http://127.0.0.1:8787/v1/actions"
timeout_ms = 15000
allow_remote_endpoint = false
window_allowlist = []
# api_key = "..."          # optional bearer token, stored encrypted
# max_coordinate_x = 1920  # optional coordinate boundary
# max_coordinate_y = 1080

Key	Default	Meaning
`endpoint`	`http://127.0.0.1:8787/v1/actions`	Sidecar endpoint for OS-level actions
`api_key`	unset	Optional bearer token (stored encrypted)
`timeout_ms`	`15000`	Per-action request timeout
`allow_remote_endpoint`	`false`	Allow a non-loopback endpoint
`window_allowlist`	`[]`	Window title/process allowlist forwarded to sidecar policy
`max_coordinate_x` / `max_coordinate_y`	unset	Optional coordinate boundaries

auto

backend = "auto" lets Revka select an available backend automatically.

The `@ref` selector model

After an open, call snapshot to map interactive elements to stable refs like @e1, @e2, @e3. Pass those refs to click, fill, hover, and similar actions. A typical agent flow:

{"action": "open", "url": "https://example.com"}
{"action": "snapshot"}
{"action": "click", "selector": "@e3"}

Selectors also accept CSS (button.submit) and text matchers (text=Accept). Re-run snapshot after the page changes — refs are recomputed each time.

GUI debugging with VNC

For visual debugging, run a virtual display and view the browser through a VNC client or noVNC in your web browser.

Install the dependencies (Ubuntu/Debian):

apt-get install -y xvfb x11vnc fluxbox novnc websockify

# Optional desktop environment (needed for Chrome Remote Desktop below)
apt-get install -y xfce4 xfce4-goodies

Start the virtual display, window manager, VNC server, and noVNC bridge:

#!/bin/bash
DISPLAY_NUM=99
VNC_PORT=5900
NOVNC_PORT=6080
RESOLUTION=1920x1080x24

Xvfb :$DISPLAY_NUM -screen 0 $RESOLUTION -ac &
sleep 1
fluxbox -display :$DISPLAY_NUM &
sleep 1
x11vnc -display :$DISPLAY_NUM -rfbport $VNC_PORT -forever -shared -nopw -bg
sleep 1
websockify --web=/usr/share/novnc $NOVNC_PORT localhost:$VNC_PORT &

Then connect:

VNC client: localhost:5900
Web browser: http://localhost:6080/vnc.html

Launch a browser on the virtual display to watch automation live:

DISPLAY=:99 google-chrome --no-sandbox https://example.com &

Chrome Remote Desktop

For a managed remote GUI through a Google account, install Chrome Remote Desktop on the server:

wget https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.deb
apt-get install -y ./chrome-remote-desktop_current_amd64.deb

# Configure the session
echo "xfce4-session" > ~/.chrome-remote-desktop-session
chmod +x ~/.chrome-remote-desktop-session

Then:

Visit https://remotedesktop.google.com/headless.
Copy the “Debian Linux” setup command and run it on your server.
Start the service: systemctl --user start chrome-remote-desktop.
Connect from any device at https://remotedesktop.google.com/access.

browser_delegate

browser_delegate hands a natural-language browser task to an external browser-capable CLI subprocess (for example, Claude Code with the claude-in-chrome MCP). It is useful for corporate web apps — Teams, Outlook, Jira, Confluence — that have no direct API and are awkward for the headless backends.

[browser_delegate]
enabled = true
cli_binary = "claude"
chrome_profile_dir = "~/.config/chrome-corp-profile"
allowed_domains = ["teams.microsoft.com", "outlook.office.com"]
blocked_domains = []
task_timeout_secs = 120

The tool takes a single task argument:

{"task": "Check my unread Teams messages"}

Troubleshooting

“Element not found”: the page may not be fully loaded. Add a wait before snapshotting — agent-browser wait --load networkidle, then agent-browser snapshot -i.
Cookie banners blocking content: snapshot, click the consent ref (e.g. @accept_cookies), then snapshot again for the real content.
web_fetch blocked inside a Docker sandbox: use the browser backend instead — agent-browser open <url> then agent-browser get text body.
Persisting login state: set [browser].session_name (or pass --session-name to the CLI) so auth cookies survive between runs.

Browser & web tools — the full tool reference, including web_fetch and text_browser
Config: channels, tools & integrations — complete [browser] and [browser_delegate] key reference
OTP gating & emergency stop — gating browser actions behind a one-time password
Cargo feature flags & ADRs — the browser-native feature for the rust-native backend

Browser automation

How browser access fits together

The `[browser]` config section

allowed_domains

Choosing a backend

agent_browser (default, recommended)

rust_native

computer_use

auto

The `@ref` selector model

GUI debugging with VNC

Chrome Remote Desktop

browser_delegate

Troubleshooting

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Browser automation

How browser access fits together

The [browser] config section

allowed_domains

Choosing a backend

agent_browser (default, recommended)

rust_native

computer_use

auto

The @ref selector model

GUI debugging with VNC

Chrome Remote Desktop

browser_delegate

Troubleshooting

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

The `[browser]` config section

The `@ref` selector model