Browser automation
Set up headless browser automation with agent-browser and GUI debugging via VNC.
Revka can drive a real web browser so your agent can open pages, read rendered content, click elements, fill forms, and take screenshots. This page covers enabling the browser tool, choosing a backend, locking navigation to an allowlist, debugging with a visible browser over VNC or Chrome Remote Desktop, and delegating tougher web apps to an external CLI with browser_delegate.
Use browser automation when a page renders content with JavaScript, requires login, or needs interaction. For simple static pages, the lighter web tools (web_fetch, text_browser) are faster and cheaper.
How browser access fits together
Section titled “How browser access fits together”Revka exposes several web-related tools. The two that matter for automation are:
browser_open— opens an approved HTTPS URL in the system browser. No scraping, no interaction.browser— full automation with pluggable backends. Supports DOM actions (open,snapshot,click,fill,type,get_text,get_title,get_url,screenshot,wait,press,hover,scroll,is_visible,close,find) plus optional OS-level actions (mouse_move,mouse_click,mouse_drag,key_type,key_press,screen_capture) when thecomputer_usebackend is active.
Both are gated by the [browser] config section and share the allowed_domains allowlist.
The [browser] config section
Section titled “The [browser] config section”Configure browser access in ~/.revka/config.toml:
[browser]enabled = trueallowed_domains = ["*"]backend = "agent_browser"native_headless = true| Key | Type | Default | Meaning |
|---|---|---|---|
enabled | bool | false | Master switch for browser and browser_open |
allowed_domains | array | [] | Navigation allowlist; "*" allows all public domains |
backend | string | "agent_browser" | agent_browser, rust_native, computer_use, or auto |
session_name | string | unset | Named browser session for agent-browser (persists auth state) |
native_headless | bool | true | Headless mode for the rust-native backend |
native_webdriver_url | string | http://127.0.0.1:9515 | WebDriver endpoint for the rust-native backend |
native_chrome_path | string | unset | Optional Chrome/Chromium executable path for the rust-native backend |
allowed_domains
Section titled “allowed_domains”allowed_domains is the security boundary for all browser navigation. It is deny-by-default:
- An empty list (
[]) blocks every URL. "*"allows all public domains, but local and private hosts are still blocked.- Specific entries use exact or subdomain matching.
[browser]enabled = true# Lock the agent to two domains only:allowed_domains = ["example.com", "docs.example.com"]To disable browser access entirely:
[browser]enabled = falseChoosing a backend
Section titled “Choosing a backend”Set backend to match your environment.
agent_browser (default, recommended)
Section titled “agent_browser (default, recommended)”The default backend drives Chrome for Testing through the agent-browser CLI. It runs headless with sandboxing and is the easiest path on a server or container.
Install it once:
# Install the CLInpm install -g agent-browser
# Download Chrome for Testingagent-browser install --with-deps # Linux (includes system deps)agent-browser install # macOS / WindowsVerify your config has backend = "agent_browser" and enabled = true, then test through the agent:
echo "Open https://example.com and tell me what it says" | revka agentYou can exercise the CLI directly to confirm the install:
agent-browser open https://example.comagent-browser get titleagent-browser snapshot -iagent-browser screenshot /tmp/test.pngagent-browser closerust_native
Section titled “rust_native”A WebDriver-based backend built into Revka. Use it when you want to talk to an existing WebDriver/ChromeDriver instance instead of the agent-browser CLI.
[browser]enabled = truebackend = "rust_native"native_headless = truenative_webdriver_url = "http://127.0.0.1:9515"native_chrome_path = "/usr/bin/chromium" # optionalcomputer_use
Section titled “computer_use”Delegates browser actions to a separate computer-use sidecar over HTTP, which performs OS-level mouse, keyboard, and screenshot actions. Choose this when you need true OS-level control rather than DOM scripting — it is what unlocks the mouse_move, mouse_click, mouse_drag, key_type, key_press, and screen_capture actions.
[browser]enabled = truebackend = "computer_use"
[browser.computer_use]endpoint = "http://127.0.0.1:8787/v1/actions"timeout_ms = 15000allow_remote_endpoint = falsewindow_allowlist = []# api_key = "..." # optional bearer token, stored encrypted# max_coordinate_x = 1920 # optional coordinate boundary# max_coordinate_y = 1080| Key | Default | Meaning |
|---|---|---|
endpoint | http://127.0.0.1:8787/v1/actions | Sidecar endpoint for OS-level actions |
api_key | unset | Optional bearer token (stored encrypted) |
timeout_ms | 15000 | Per-action request timeout |
allow_remote_endpoint | false | Allow a non-loopback endpoint |
window_allowlist | [] | Window title/process allowlist forwarded to sidecar policy |
max_coordinate_x / max_coordinate_y | unset | Optional coordinate boundaries |
backend = "auto" lets Revka select an available backend automatically.
The @ref selector model
Section titled “The @ref selector model”After an open, call snapshot to map interactive elements to stable refs like @e1, @e2, @e3. Pass those refs to click, fill, hover, and similar actions. A typical agent flow:
{"action": "open", "url": "https://example.com"}{"action": "snapshot"}{"action": "click", "selector": "@e3"}Selectors also accept CSS (button.submit) and text matchers (text=Accept). Re-run snapshot after the page changes — refs are recomputed each time.
GUI debugging with VNC
Section titled “GUI debugging with VNC”For visual debugging, run a virtual display and view the browser through a VNC client or noVNC in your web browser.
Install the dependencies (Ubuntu/Debian):
apt-get install -y xvfb x11vnc fluxbox novnc websockify
# Optional desktop environment (needed for Chrome Remote Desktop below)apt-get install -y xfce4 xfce4-goodiesStart the virtual display, window manager, VNC server, and noVNC bridge:
#!/bin/bashDISPLAY_NUM=99VNC_PORT=5900NOVNC_PORT=6080RESOLUTION=1920x1080x24
Xvfb :$DISPLAY_NUM -screen 0 $RESOLUTION -ac &sleep 1fluxbox -display :$DISPLAY_NUM &sleep 1x11vnc -display :$DISPLAY_NUM -rfbport $VNC_PORT -forever -shared -nopw -bgsleep 1websockify --web=/usr/share/novnc $NOVNC_PORT localhost:$VNC_PORT &Then connect:
- VNC client:
localhost:5900 - Web browser:
http://localhost:6080/vnc.html
Launch a browser on the virtual display to watch automation live:
DISPLAY=:99 google-chrome --no-sandbox https://example.com &Chrome Remote Desktop
Section titled “Chrome Remote Desktop”For a managed remote GUI through a Google account, install Chrome Remote Desktop on the server:
wget https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.debapt-get install -y ./chrome-remote-desktop_current_amd64.deb
# Configure the sessionecho "xfce4-session" > ~/.chrome-remote-desktop-sessionchmod +x ~/.chrome-remote-desktop-sessionThen:
- Visit
https://remotedesktop.google.com/headless. - Copy the “Debian Linux” setup command and run it on your server.
- Start the service:
systemctl --user start chrome-remote-desktop. - Connect from any device at
https://remotedesktop.google.com/access.
browser_delegate
Section titled “browser_delegate”browser_delegate hands a natural-language browser task to an external browser-capable CLI subprocess (for example, Claude Code with the claude-in-chrome MCP). It is useful for corporate web apps — Teams, Outlook, Jira, Confluence — that have no direct API and are awkward for the headless backends.
[browser_delegate]enabled = truecli_binary = "claude"chrome_profile_dir = "~/.config/chrome-corp-profile"allowed_domains = ["teams.microsoft.com", "outlook.office.com"]blocked_domains = []task_timeout_secs = 120The tool takes a single task argument:
{"task": "Check my unread Teams messages"}Troubleshooting
Section titled “Troubleshooting”- “Element not found”: the page may not be fully loaded. Add a wait before snapshotting —
agent-browser wait --load networkidle, thenagent-browser snapshot -i. - Cookie banners blocking content: snapshot, click the consent ref (e.g.
@accept_cookies), then snapshot again for the real content. web_fetchblocked inside a Docker sandbox: use the browser backend instead —agent-browser open <url>thenagent-browser get text body.- Persisting login state: set
[browser].session_name(or pass--session-nameto the CLI) so auth cookies survive between runs.
Related pages
Section titled “Related pages”- Browser & web tools — the full tool reference, including
web_fetchandtext_browser - Config: channels, tools & integrations — complete
[browser]and[browser_delegate]key reference - OTP gating & emergency stop — gating browser actions behind a one-time password
- Cargo feature flags & ADRs — the
browser-nativefeature for the rust-native backend