Media & vision tools
Screenshot, image info, image generation, canvas rendering, calculator, and weather.
This page covers Revka’s media, vision, and utility tools: capturing the screen, inspecting image files, generating images, rendering live visual content to the dashboard, doing exact arithmetic, and fetching weather. Reach for these when your agent needs to see something, produce visual output, or compute a number it must not guess. For an overview of how tools are assembled and gated, see Tools overview. For how images attached to a message reach the model, see the [multimodal] section at the end of this page.
screenshot
Section titled “screenshot”Captures the current screen using a platform-native command and returns both the saved file path and inline base64 image data, so a multimodal model can view the result.
- macOS: uses
screencapture. - Linux: tries
gnome-screenshot, thenscrot, then ImageMagick’simport(whichever is installed). If none is found, the call fails with a hint to install one. - Other platforms: unsupported.
The screenshot is saved into the workspace directory. The tool requires the agent to be allowed to act (it is blocked under read-only autonomy — see Autonomy levels & approvals). Filenames are sanitized to the final path component, and shell-unsafe characters are rejected. Base64 is only inlined for images up to ~1.5 MB; larger captures return the path and size only.
| Parameter | Type | Default | Meaning |
|---|---|---|---|
filename | string | screenshot_<timestamp>.png | Output filename, saved in the workspace. |
region | string | full screen | macOS only. selection for an interactive crop, window for the front window. Ignored on Linux. |
{ "filename": "debug.png", "region": "window" }image_info
Section titled “image_info”Reads image metadata — format, pixel dimensions, and byte size — directly from the file’s header bytes, with no external dependencies. Optionally returns the file as base64 for handing to a vision model.
Format is detected by magic bytes; supported formats are PNG, JPEG, GIF, WEBP, and BMP. Dimensions are parsed from the header for each of those formats. Reads are restricted to the workspace (paths outside it are rejected), and files larger than 5 MB are refused.
| Parameter | Type | Default | Meaning |
|---|---|---|---|
path | string | — (required) | Image file path, absolute or relative to the workspace. |
include_base64 | boolean | false | When true, append a data: URI with the base64-encoded image. |
{ "path": "assets/logo.png", "include_base64": false }image_gen
Section titled “image_gen”Generates an image from a text prompt via fal.ai’s synchronous API (Flux / Nano Banana models), downloads the result, and saves it as a PNG in the workspace images/ directory.
This tool is disabled by default and side-effecting (it makes an HTTP call and writes a file), so it requires both configuration and act-level autonomy. Enable it under [image_gen] and provide a fal.ai API key through the configured environment variable (default FAL_API_KEY).
| Parameter | Type | Default | Meaning |
|---|---|---|---|
prompt | string | — (required) | Text description of the image. |
filename | string | generated_image | Output filename without extension; saved as <name>.png under workspace/images/. |
size | string | square_hd | One of square_hd, landscape_4_3, portrait_4_3, landscape_16_9, portrait_16_9. |
model | string | from config | fal.ai model path, e.g. fal-ai/flux/schnell. Must be a valid fal.ai model path. |
{ "prompt": "a sunset over snow-capped mountains, photorealistic", "size": "landscape_16_9", "model": "fal-ai/flux/schnell"}Configuration ([image_gen]):
[image_gen]enabled = true # default: falsedefault_model = "fal-ai/flux/schnell" # fal.ai model pathapi_key_env = "FAL_API_KEY" # env var holding the fal.ai key| Key | Type | Default | Meaning |
|---|---|---|---|
enabled | bool | false | Register the image_gen tool. |
default_model | string | "fal-ai/flux/schnell" | Model used when the call omits model. |
api_key_env | string | "FAL_API_KEY" | Name of the environment variable holding the fal.ai API key. |
export FAL_API_KEY="your-fal-ai-key"canvas
Section titled “canvas”Pushes rendered content to the Live Canvas — a real-time preview panel in the web dashboard. Frames are stored in a process-global store and broadcast to connected WebSocket viewers, so the agent can build a visualization that users watch update live. See Agents, teams & canvas and the realtime Live Canvas API.
Each canvas is addressed by a canvas_id string (default default); content size is capped at 256 KB per frame.
| Action | What it does |
|---|---|
render | Push content of the given content_type to the canvas. |
snapshot | Return the canvas’s current frame. |
clear | Reset the canvas (clears current content and history). |
eval | Send a JavaScript expression to be evaluated client-side in the canvas iframe; the result is visible to connected viewers. |
| Parameter | Type | Default | Meaning |
|---|---|---|---|
action | string | — (required) | render, snapshot, clear, or eval. |
canvas_id | string | default | Canvas identifier. |
content_type | string | html | For render: html, svg, markdown, or text. |
content | string | — | Content to render (required for render). |
expression | string | — | JavaScript expression (required for eval). |
{ "action": "render", "canvas_id": "main", "content_type": "html", "content": "<h1>Build status: green</h1>"}calculator
Section titled “calculator”Performs exact arithmetic and statistics. Use it instead of letting the model guess numeric results. It exposes 25 named functions across arithmetic, logarithms/exponentials, aggregation, statistics, and utilities.
| Category | Functions |
|---|---|
| Arithmetic | add, subtract, divide, multiply, pow, sqrt, abs, modulo, round |
| Logarithmic / exponential | log, ln, exp, factorial |
| Aggregation | sum, average, count, min, max, range |
| Statistics | median, mode, variance, stdev, percentile |
| Utility | percentage_change, clamp |
Inputs depend on the function:
| Parameter | Type | Used by |
|---|---|---|
function | string | — (required) |
values | array of numbers | add, subtract, divide, multiply, sum, average, count, min, max, range, median, mode, variance, stdev, percentile |
a, b | number | pow, modulo, percentage_change |
x | number | sqrt, abs, exp, ln, log, factorial, round |
base | number | log (default 10) |
decimals | integer | round |
p | integer (0–100) | percentile |
min_val, max_val | number | clamp |
{ "function": "average", "values": [1, 2, 3, 4, 5] }{ "function": "percentile", "values": [10, 20, 30, 40], "p": 90 }Guardrails return clear errors rather than NaN: division and modulo by zero, square root of a negative, log/ln of a non-positive number, and factorial of a non-integer or of an input above 170 (which overflows f64) are all rejected.
weather
Section titled “weather”Returns current conditions and an up-to-3-day forecast for any location worldwide via the free wttr.in service. No API key is required.
Locations are flexible: city names (in any language or script), IATA airport codes, GPS coordinates, postal/zip codes, and domain-based geolocation.
| Parameter | Type | Default | Meaning |
|---|---|---|---|
location | string | — (required) | City, IATA code, lat,lon, postal code, or domain. |
units | string | metric | metric (°C, km/h, mm) or imperial (°F, mph, in). |
days | integer | 1 | Forecast days, 0–3. 0 returns current conditions only. |
{ "location": "Seoul", "units": "metric", "days": 3 }{ "location": "LAX" }{ "location": "35.6762,139.6503", "units": "imperial", "days": 1 }The output includes conditions, temperature and “feels like”, humidity, wind, precipitation, visibility, pressure, cloud cover, UV index, and — for forecast days — daily highs/lows, sun hours, astronomy (sunrise/sunset/moon phase), and hourly slots for short forecasts. Unknown locations return a helpful error suggesting alternative input formats. If you route outbound traffic through a proxy, weather requests use the tool.weather service key — see [proxy] configuration.
[multimodal] — incoming image handling
Section titled “[multimodal] — incoming image handling”The [multimodal] section controls how images attached to a message are passed to the model, distinct from image_gen (which produces images). Attach images inline with the marker syntax [IMAGE:/path/to/file.png] or [IMAGE:data:image/png;base64,...].
[multimodal]max_images = 4 # per request; clamped to 1–16max_image_size_mb = 5 # per image, before base64; clamped to 1–20allow_remote_fetch = false # allow [IMAGE:https://...] remote URLsvision_provider = "ollama" # optional: route images to a dedicated vision modelvision_model = "llava:7b" # used only when vision_provider is set| Key | Type | Default | Meaning |
|---|---|---|---|
max_images | integer | 4 | Maximum image attachments per request. Clamped to 1–16. |
max_image_size_mb | integer | 5 | Maximum size per image before base64 encoding. Clamped to 1–20. |
allow_remote_fetch | bool | false | When true, allow [IMAGE:https://...] remote URLs. |
vision_provider | string | unset | Optional provider to route images to instead of the default text provider. |
vision_model | string | unset | Model used with vision_provider; only applies when that field is set. |
For automatic pre-processing of inbound media (transcribing audio, describing images, summarizing video before the agent sees them), see the [media_pipeline] configuration in Config: channels, tools & integrations.
Related
Section titled “Related”- Tools overview
- Filesystem & code tools —
file_readextracts PDF text inline - Browser & web tools — browser screenshots and page automation
- Run the dashboard — where the Live Canvas appears
- Realtime: WebSocket, SSE & Live Canvas