Media & vision tools

Screenshot, image info, image generation, canvas rendering, calculator, and weather.

This page covers Revka’s media, vision, and utility tools: capturing the screen, inspecting image files, generating images, rendering live visual content to the dashboard, doing exact arithmetic, and fetching weather. Reach for these when your agent needs to see something, produce visual output, or compute a number it must not guess. For an overview of how tools are assembled and gated, see Tools overview. For how images attached to a message reach the model, see the [multimodal] section at the end of this page.

screenshot

Captures the current screen using a platform-native command and returns both the saved file path and inline base64 image data, so a multimodal model can view the result.

macOS: uses screencapture.
Linux: tries gnome-screenshot, then scrot, then ImageMagick’s import (whichever is installed). If none is found, the call fails with a hint to install one.
Other platforms: unsupported.

The screenshot is saved into the workspace directory. The tool requires the agent to be allowed to act (it is blocked under read-only autonomy — see Autonomy levels & approvals). Filenames are sanitized to the final path component, and shell-unsafe characters are rejected. Base64 is only inlined for images up to ~1.5 MB; larger captures return the path and size only.

Parameter	Type	Default	Meaning
`filename`	string	`screenshot_<timestamp>.png`	Output filename, saved in the workspace.
`region`	string	full screen	macOS only. `selection` for an interactive crop, `window` for the front window. Ignored on Linux.

{ "filename": "debug.png", "region": "window" }

image_info

Reads image metadata — format, pixel dimensions, and byte size — directly from the file’s header bytes, with no external dependencies. Optionally returns the file as base64 for handing to a vision model.

Format is detected by magic bytes; supported formats are PNG, JPEG, GIF, WEBP, and BMP. Dimensions are parsed from the header for each of those formats. Reads are restricted to the workspace (paths outside it are rejected), and files larger than 5 MB are refused.

Parameter	Type	Default	Meaning
`path`	string	— (required)	Image file path, absolute or relative to the workspace.
`include_base64`	boolean	`false`	When `true`, append a `data:` URI with the base64-encoded image.

{ "path": "assets/logo.png", "include_base64": false }

image_gen

Generates an image from a text prompt via fal.ai’s synchronous API (Flux / Nano Banana models), downloads the result, and saves it as a PNG in the workspace images/ directory.

This tool is disabled by default and side-effecting (it makes an HTTP call and writes a file), so it requires both configuration and act-level autonomy. Enable it under [image_gen] and provide a fal.ai API key through the configured environment variable (default FAL_API_KEY).

Parameter	Type	Default	Meaning
`prompt`	string	— (required)	Text description of the image.
`filename`	string	`generated_image`	Output filename without extension; saved as `<name>.png` under `workspace/images/`.
`size`	string	`square_hd`	One of `square_hd`, `landscape_4_3`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`.
`model`	string	from config	fal.ai model path, e.g. `fal-ai/flux/schnell`. Must be a valid fal.ai model path.

{
  "prompt": "a sunset over snow-capped mountains, photorealistic",
  "size": "landscape_16_9",
  "model": "fal-ai/flux/schnell"
}

Configuration ([image_gen]):

[image_gen]
enabled = true                       # default: false
default_model = "fal-ai/flux/schnell"  # fal.ai model path
api_key_env = "FAL_API_KEY"          # env var holding the fal.ai key

Key	Type	Default	Meaning
`enabled`	bool	`false`	Register the `image_gen` tool.
`default_model`	string	`"fal-ai/flux/schnell"`	Model used when the call omits `model`.
`api_key_env`	string	`"FAL_API_KEY"`	Name of the environment variable holding the fal.ai API key.

export FAL_API_KEY="your-fal-ai-key"

canvas

Pushes rendered content to the Live Canvas — a real-time preview panel in the web dashboard. Frames are stored in a process-global store and broadcast to connected WebSocket viewers, so the agent can build a visualization that users watch update live. See Agents, teams & canvas and the realtime Live Canvas API.

Each canvas is addressed by a canvas_id string (default default); content size is capped at 256 KB per frame.

Action	What it does
`render`	Push `content` of the given `content_type` to the canvas.
`snapshot`	Return the canvas’s current frame.
`clear`	Reset the canvas (clears current content and history).
`eval`	Send a JavaScript expression to be evaluated client-side in the canvas iframe; the result is visible to connected viewers.

Parameter	Type	Default	Meaning
`action`	string	— (required)	`render`, `snapshot`, `clear`, or `eval`.
`canvas_id`	string	`default`	Canvas identifier.
`content_type`	string	`html`	For `render`: `html`, `svg`, `markdown`, or `text`.
`content`	string	—	Content to render (required for `render`).
`expression`	string	—	JavaScript expression (required for `eval`).

{
  "action": "render",
  "canvas_id": "main",
  "content_type": "html",
  "content": "<h1>Build status: green</h1>"
}

calculator

Performs exact arithmetic and statistics. Use it instead of letting the model guess numeric results. It exposes 25 named functions across arithmetic, logarithms/exponentials, aggregation, statistics, and utilities.

Category	Functions
Arithmetic	`add`, `subtract`, `divide`, `multiply`, `pow`, `sqrt`, `abs`, `modulo`, `round`
Logarithmic / exponential	`log`, `ln`, `exp`, `factorial`
Aggregation	`sum`, `average`, `count`, `min`, `max`, `range`
Statistics	`median`, `mode`, `variance`, `stdev`, `percentile`
Utility	`percentage_change`, `clamp`

Inputs depend on the function:

Parameter	Type	Used by
`function`	string	— (required)
`values`	array of numbers	`add`, `subtract`, `divide`, `multiply`, `sum`, `average`, `count`, `min`, `max`, `range`, `median`, `mode`, `variance`, `stdev`, `percentile`
`a`, `b`	number	`pow`, `modulo`, `percentage_change`
`x`	number	`sqrt`, `abs`, `exp`, `ln`, `log`, `factorial`, `round`
`base`	number	`log` (default `10`)
`decimals`	integer	`round`
`p`	integer (0–100)	`percentile`
`min_val`, `max_val`	number	`clamp`

{ "function": "average", "values": [1, 2, 3, 4, 5] }

{ "function": "percentile", "values": [10, 20, 30, 40], "p": 90 }

Guardrails return clear errors rather than NaN: division and modulo by zero, square root of a negative, log/ln of a non-positive number, and factorial of a non-integer or of an input above 170 (which overflows f64) are all rejected.

weather

Returns current conditions and an up-to-3-day forecast for any location worldwide via the free wttr.in service. No API key is required.

Locations are flexible: city names (in any language or script), IATA airport codes, GPS coordinates, postal/zip codes, and domain-based geolocation.

Parameter	Type	Default	Meaning
`location`	string	— (required)	City, IATA code, `lat,lon`, postal code, or domain.
`units`	string	`metric`	`metric` (°C, km/h, mm) or `imperial` (°F, mph, in).
`days`	integer	`1`	Forecast days, `0`–`3`. `0` returns current conditions only.

{ "location": "Seoul", "units": "metric", "days": 3 }

{ "location": "LAX" }

{ "location": "35.6762,139.6503", "units": "imperial", "days": 1 }

The output includes conditions, temperature and “feels like”, humidity, wind, precipitation, visibility, pressure, cloud cover, UV index, and — for forecast days — daily highs/lows, sun hours, astronomy (sunrise/sunset/moon phase), and hourly slots for short forecasts. Unknown locations return a helpful error suggesting alternative input formats. If you route outbound traffic through a proxy, weather requests use the tool.weather service key — see [proxy] configuration.

`[multimodal]` — incoming image handling

The [multimodal] section controls how images attached to a message are passed to the model, distinct from image_gen (which produces images). Attach images inline with the marker syntax [IMAGE:/path/to/file.png] or [IMAGE:data:image/png;base64,...].

[multimodal]
max_images = 4              # per request; clamped to 1–16
max_image_size_mb = 5       # per image, before base64; clamped to 1–20
allow_remote_fetch = false  # allow [IMAGE:https://...] remote URLs
vision_provider = "ollama"  # optional: route images to a dedicated vision model
vision_model = "llava:7b"   # used only when vision_provider is set

Key	Type	Default	Meaning
`max_images`	integer	`4`	Maximum image attachments per request. Clamped to 1–16.
`max_image_size_mb`	integer	`5`	Maximum size per image before base64 encoding. Clamped to 1–20.
`allow_remote_fetch`	bool	`false`	When `true`, allow `[IMAGE:https://...]` remote URLs.
`vision_provider`	string	unset	Optional provider to route images to instead of the default text provider.
`vision_model`	string	unset	Model used with `vision_provider`; only applies when that field is set.

For automatic pre-processing of inbound media (transcribing audio, describing images, summarizing video before the agent sees them), see the [media_pipeline] configuration in Config: channels, tools & integrations.

Tools overview
Filesystem & code tools — file_read extracts PDF text inline
Browser & web tools — browser screenshots and page automation
Run the dashboard — where the Live Canvas appears
Realtime: WebSocket, SSE & Live Canvas

Media & vision tools

screenshot

image_info

image_gen

canvas

calculator

weather

`[multimodal]` — incoming image handling

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Media & vision tools

screenshot

image_info

image_gen

canvas

calculator

weather

[multimodal] — incoming image handling

Related

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

`[multimodal]` — incoming image handling