The agent loop

How Revka runs an agentic turn: tool iterations, parallel tools, context compression, and history management.

Every message Revka handles — from a channel, the dashboard chat, the /webhook endpoint, a cron job, or revka agent — runs through the same tool-using agent loop. The LLM reasons, issues tool calls, reads the results, and reasons again, repeating until the task is done or a limit is reached. This page explains how that loop is bounded and tuned: how many iterations it can take, when tools run in parallel, how Revka keeps the context window from overflowing, and how conversation history is pruned over time.

Read this when you need to tune agent behaviour — extending long-running tasks, speeding up multi-tool turns, cutting token cost on big tool outputs, or running on a small-context local model. Every knob below lives in the [agent] section of ~/.revka/config.toml. For the surrounding architecture, see How Revka works; for the policy checks applied to each tool call, see Autonomy levels & approvals.

How a turn runs

Ingress. The message enters the loop from a channel, WebSocket (/ws/chat), SSE, a webhook, or the CLI.
Reason and call tools. The LLM produces reasoning and zero or more tool calls. Each tool call is checked against the security policy before it executes.
Execute. Independent tool calls run — concurrently when parallel_tools = true and no call needs approval gating. Results are returned in stable order regardless of completion order.
Iterate. Tool results feed back into the prompt and the LLM reasons again. Each cycle counts as one tool iteration.
Stop. The loop ends when the LLM returns a final answer with no tool calls, or when max_tool_iterations is hit. The reply streams back to the originating surface.

Between iterations, Revka transparently manages the context window: it compacts tool schemas, trims oversized tool results, and — when the running token total crosses a threshold — compresses older history. None of this requires action from the agent or the user; the defaults are tuned for a 1M-token context model.

Orchestration settings (`[agent]`)

The core loop knobs. All keys are optional and fall back to the defaults shown.

[agent]
max_tool_iterations = 60
parallel_tools = true
max_context_tokens = 1050000
max_history_messages = 1000
keep_tool_context_turns = 2
max_tool_result_chars = 50000
context_window_safety_ratio = 0.95

Key	Type	Default	Meaning
`max_tool_iterations`	int	`60`	Maximum tool-call loop turns per user message, across CLI, gateway, and channels. `0` falls back to `60`.
`parallel_tools`	bool	`true`	Execute independent tool calls within one iteration concurrently.
`max_context_tokens`	int	`1050000`	Token budget used by loop-level context trimming and compression.
`max_history_messages`	int	`1000`	Maximum conversation history messages retained per session.
`keep_tool_context_turns`	int	`2`	Recent turns whose full tool-call/result messages are preserved in channel history.
`max_tool_result_chars`	int	`50000`	Maximum characters retained for a single tool result before middle truncation.
`context_window_safety_ratio`	float	`0.95`	Fraction of the model context window allowed before Revka fails loud. Clamped to `1.0`; values `<= 0` fall back to `0.95`.
`tool_call_dedup_exempt`	`[string]`	`[]`	Exact tool names allowed to be called repeatedly with identical arguments in one turn, bypassing duplicate-call suppression.
`compact_context`	bool	`true`	Use a compact bootstrap prompt (smaller RAG and bootstrap budgets); intended for 13B or smaller models.

Tool iterations

max_tool_iterations caps how many reasoning-then-tool-call cycles a single message may take. A simple Q&A might use one iteration; a multi-step task that reads files, runs a build, and reports results uses several. When the cap is exceeded on a channel message, the runtime returns:

Agent exceeded maximum tool iterations (60)

Raise it for deep autonomous tasks; lower it to fail fast and contain cost on untrusted input. Note that the channel message timeout budget scales with this value: it is message_timeout_secs * min(max_tool_iterations, message_timeout_scale_max), so a higher iteration cap also grants more wall-clock time (up to the [pacing] message_timeout_scale_max cap, default 4).

Parallel tools

When parallel_tools = true (the default), Revka dispatches independent tool calls from a single iteration concurrently instead of one at a time — for example, reading three files at once, or fetching two URLs in parallel. Results are reassembled in their original order, so the LLM sees a stable, deterministic result list.

Calls that require approval gating are not parallelised; they run through the normal supervised-approval path. Setting parallel_tools = false forces strictly sequential execution, which is occasionally useful when tools share fragile external state.

Context compression

Revka keeps the prompt within the model’s context window using a deterministic, zero-LLM compression layer plus an optional summarisation pass for older history. Configure it under [agent.context_compression].

[agent.context_compression]
enabled = true
threshold_ratio = 0.5
protect_first_n = 3
protect_last_n = 4
max_passes = 3
compact_tool_schemas = true
terse_internal_outputs = true

Key	Default	Meaning
`enabled`	`true`	Enable automatic context compression.
`threshold_ratio`	`0.5`	Fraction of the context window that triggers a compression pass.
`protect_first_n`	`3`	Messages protected at the start of history (the framing of the task).
`protect_last_n`	`4`	Recent messages protected from compression.
`max_passes`	`3`	Maximum compression passes before failing loud.
`summary_max_chars`	`4000`	Maximum characters retained in stored compaction summaries.
`source_max_chars`	`50000`	Safety cap on transcript text passed to the summariser.
`timeout_secs`	`60`	Timeout for the summarisation provider call.
`live_tool_result_max_chars`	`12000`	Max characters retained for a live tool result before content-aware compression.
`tool_result_retrim_chars`	`2000`	Max characters retained for older tool results during fast trim.
`input_max_chars`	`24000`	Max characters retained for a single large user input before content-aware compression.
`compact_tool_schemas`	`true`	Shorten native tool descriptions and JSON-schema metadata before each LLM call.
`compact_system_tool_docs`	`true`	Render compact tool docs in the system prompt when schemas are sent separately.
`tool_description_max_chars`	`180`	Max characters per tool description after schema compaction.
`schema_description_max_chars`	`120`	Max characters per JSON-schema `description` after compaction.
`terse_internal_outputs`	`true`	Use concise output contracts for internal operator/agent handoffs.
`tool_result_trim_exempt`	`[string]`	`[]` — tool names exempt from tool-result trimming.

The four content-aware axes

The content-aware layer is deterministic — no LLM call, no token cost — and applies type-specific reduction on four token-heavy axes:

Axis	Source	Reduced to
Large pasted input	Big user messages	Schema-and-samples for structured data; bounded text otherwise
CLI / shell output	`shell` and command tool results	Failure lines plus a tail of the output
General tool output	Any large tool result (JSON, diffs, …)	JSON → schema + samples; diffs → file/hunk/change summaries
Code-search output	`content_search`, `semantic_code_search`	Grouped file hits

For semantic code search, semantic_code_search uses Semble when installed, falls back to bounded local ripgrep, and finally to a built-in literal scan, so code search stays available in zero-install environments. See the Tools overview for the full tool catalog.

Schema and handoff compaction

Two further reductions shrink the base context that every call carries:

Tool-schema compaction (compact_tool_schemas) trims native tool descriptions and JSON-schema metadata before each provider call, and the system prompt omits parameter schemas that are already supplied through the provider’s tool interface.
Terse internal outputs (terse_internal_outputs) apply a concise handoff contract to operator and sub-agent prompts. Set REVKA_TERSE_INTERNAL_OUTPUTS=0 to disable the Python operator side.

Other environment overrides for compression budgets: REVKA_AGENT_RESULT_MAX_CHARS (operator-side agent last_message budget), REVKA_WORKFLOW_SKILL_MAX_CHARS, and REVKA_WORKFLOW_SKILL_CONTEXT_MODE (pointer to send only krefs/paths, full to restore legacy full-inline skill context).

History pruning

Distinct from per-turn compression, [agent.history_pruning] is a token-efficiency pass over the message list itself. It is disabled by default — turn it on for long-lived sessions or small-context models.

[agent.history_pruning]
enabled = false
max_tokens = 8192
keep_recent = 4
collapse_tool_results = true

Key	Default	Meaning
`enabled`	`false`	Enable history pruning.
`max_tokens`	`8192`	Maximum estimated tokens for message history.
`keep_recent`	`4`	Keep the N most recent messages untouched.
`collapse_tool_results`	`true`	Collapse old assistant tool-call / tool-result pairs into short summaries.

System messages and the keep_recent most recent messages are always protected. When enabled, pruning first collapses old tool-call/result pairs, then drops older messages until the estimated token total is under max_tokens.

The related keep_tool_context_turns (default 2, in the top-level [agent] table) controls how many recent turns keep their full tool-call and tool-result messages in channel history — older turns keep the conversational text but shed the verbose tool payloads.

Tool filtering (`tool_filter_groups`)

When you connect many external MCP servers, sending every tool schema on every turn is expensive. tool_filter_groups limits which MCP tool schemas are sent to the LLM per turn. Built-in (non-MCP) tools always pass through unchanged, and when the list is empty the feature is inactive — all tools pass through (the backward-compatible default).

Each group is a table:

Field	Type	Purpose
`mode`	`"always"` \| `"dynamic"`	`always`: include the tool unconditionally. `dynamic`: include it only when the last user message contains a keyword.
`tools`	`[string]`	Tool name patterns. A single `` wildcard is supported (prefix, suffix, or infix), e.g. `"mcp_vikunja_"`.
`keywords`	`[string]`	Dynamic mode only. Case-insensitive substrings matched against the last user message.

[agent]
# Vikunja task-management MCP tools are always available.
[[agent.tool_filter_groups]]
mode = "always"
tools = ["mcp_vikunja_*"]

# Browser MCP tools are only included when the user mentions browsing.
[[agent.tool_filter_groups]]
mode = "dynamic"
tools = ["mcp_browser_*"]
keywords = ["browse", "navigate", "open url", "screenshot"]

Tuning recipes

Goal	Change
Longer autonomous tasks	Raise `max_tool_iterations`; raise `[pacing] message_timeout_scale_max`.
Faster multi-tool turns	Keep `parallel_tools = true` (default).
Lower token cost on big outputs	Lower `max_tool_result_chars` and `live_tool_result_max_chars`; enable `tool_filter_groups`.
Small-context / local model	Set `compact_context = true`, enable `[agent.history_pruning]`, set `[agent.model_context_windows]`, and tune `[pacing]`.
Tame a runaway loop	Lower `max_tool_iterations`; see emergency stop.

For slow or local LLM deployments (Ollama, llama.cpp, vLLM), pair these with the [pacing] controls — step timeouts and loop detection — described in Custom providers & local LLMs.

Autonomy levels & approvals — the policy checks applied to every tool call.
Sessions & conversation state — how history is scoped and persisted.
Agents, teams & swarms — multi-agent orchestration above the loop.
Config: provider, agent & routing — the full [agent] config reference.
Tools overview — the catalog of callable tools.

The agent loop

How a turn runs

Orchestration settings (`[agent]`)

Tool iterations

Parallel tools

Context compression

The four content-aware axes

Schema and handoff compaction

History pruning

Tool filtering (`tool_filter_groups`)

Tuning recipes

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

The agent loop

How a turn runs

Orchestration settings ([agent])

Tool iterations

Parallel tools

Context compression

The four content-aware axes

Schema and handoff compaction

History pruning

Tool filtering (tool_filter_groups)

Tuning recipes

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Orchestration settings (`[agent]`)

Tool filtering (`tool_filter_groups`)