Skip to content

The agent loop

How Revka runs an agentic turn: tool iterations, parallel tools, context compression, and history management.

Every message Revka handles — from a channel, the dashboard chat, the /webhook endpoint, a cron job, or revka agent — runs through the same tool-using agent loop. The LLM reasons, issues tool calls, reads the results, and reasons again, repeating until the task is done or a limit is reached. This page explains how that loop is bounded and tuned: how many iterations it can take, when tools run in parallel, how Revka keeps the context window from overflowing, and how conversation history is pruned over time.

Read this when you need to tune agent behaviour — extending long-running tasks, speeding up multi-tool turns, cutting token cost on big tool outputs, or running on a small-context local model. Every knob below lives in the [agent] section of ~/.revka/config.toml. For the surrounding architecture, see How Revka works; for the policy checks applied to each tool call, see Autonomy levels & approvals.

  1. Ingress. The message enters the loop from a channel, WebSocket (/ws/chat), SSE, a webhook, or the CLI.

  2. Reason and call tools. The LLM produces reasoning and zero or more tool calls. Each tool call is checked against the security policy before it executes.

  3. Execute. Independent tool calls run — concurrently when parallel_tools = true and no call needs approval gating. Results are returned in stable order regardless of completion order.

  4. Iterate. Tool results feed back into the prompt and the LLM reasons again. Each cycle counts as one tool iteration.

  5. Stop. The loop ends when the LLM returns a final answer with no tool calls, or when max_tool_iterations is hit. The reply streams back to the originating surface.

Between iterations, Revka transparently manages the context window: it compacts tool schemas, trims oversized tool results, and — when the running token total crosses a threshold — compresses older history. None of this requires action from the agent or the user; the defaults are tuned for a 1M-token context model.

The core loop knobs. All keys are optional and fall back to the defaults shown.

[agent]
max_tool_iterations = 60
parallel_tools = true
max_context_tokens = 1050000
max_history_messages = 1000
keep_tool_context_turns = 2
max_tool_result_chars = 50000
context_window_safety_ratio = 0.95
KeyTypeDefaultMeaning
max_tool_iterationsint60Maximum tool-call loop turns per user message, across CLI, gateway, and channels. 0 falls back to 60.
parallel_toolsbooltrueExecute independent tool calls within one iteration concurrently.
max_context_tokensint1050000Token budget used by loop-level context trimming and compression.
max_history_messagesint1000Maximum conversation history messages retained per session.
keep_tool_context_turnsint2Recent turns whose full tool-call/result messages are preserved in channel history.
max_tool_result_charsint50000Maximum characters retained for a single tool result before middle truncation.
context_window_safety_ratiofloat0.95Fraction of the model context window allowed before Revka fails loud. Clamped to 1.0; values <= 0 fall back to 0.95.
tool_call_dedup_exempt[string][]Exact tool names allowed to be called repeatedly with identical arguments in one turn, bypassing duplicate-call suppression.
compact_contextbooltrueUse a compact bootstrap prompt (smaller RAG and bootstrap budgets); intended for 13B or smaller models.

max_tool_iterations caps how many reasoning-then-tool-call cycles a single message may take. A simple Q&A might use one iteration; a multi-step task that reads files, runs a build, and reports results uses several. When the cap is exceeded on a channel message, the runtime returns:

Agent exceeded maximum tool iterations (60)

Raise it for deep autonomous tasks; lower it to fail fast and contain cost on untrusted input. Note that the channel message timeout budget scales with this value: it is message_timeout_secs * min(max_tool_iterations, message_timeout_scale_max), so a higher iteration cap also grants more wall-clock time (up to the [pacing] message_timeout_scale_max cap, default 4).

When parallel_tools = true (the default), Revka dispatches independent tool calls from a single iteration concurrently instead of one at a time — for example, reading three files at once, or fetching two URLs in parallel. Results are reassembled in their original order, so the LLM sees a stable, deterministic result list.

Calls that require approval gating are not parallelised; they run through the normal supervised-approval path. Setting parallel_tools = false forces strictly sequential execution, which is occasionally useful when tools share fragile external state.

Revka keeps the prompt within the model’s context window using a deterministic, zero-LLM compression layer plus an optional summarisation pass for older history. Configure it under [agent.context_compression].

[agent.context_compression]
enabled = true
threshold_ratio = 0.5
protect_first_n = 3
protect_last_n = 4
max_passes = 3
compact_tool_schemas = true
terse_internal_outputs = true
KeyDefaultMeaning
enabledtrueEnable automatic context compression.
threshold_ratio0.5Fraction of the context window that triggers a compression pass.
protect_first_n3Messages protected at the start of history (the framing of the task).
protect_last_n4Recent messages protected from compression.
max_passes3Maximum compression passes before failing loud.
summary_max_chars4000Maximum characters retained in stored compaction summaries.
source_max_chars50000Safety cap on transcript text passed to the summariser.
timeout_secs60Timeout for the summarisation provider call.
live_tool_result_max_chars12000Max characters retained for a live tool result before content-aware compression.
tool_result_retrim_chars2000Max characters retained for older tool results during fast trim.
input_max_chars24000Max characters retained for a single large user input before content-aware compression.
compact_tool_schemastrueShorten native tool descriptions and JSON-schema metadata before each LLM call.
compact_system_tool_docstrueRender compact tool docs in the system prompt when schemas are sent separately.
tool_description_max_chars180Max characters per tool description after schema compaction.
schema_description_max_chars120Max characters per JSON-schema description after compaction.
terse_internal_outputstrueUse concise output contracts for internal operator/agent handoffs.
tool_result_trim_exempt[string][] — tool names exempt from tool-result trimming.

The content-aware layer is deterministic — no LLM call, no token cost — and applies type-specific reduction on four token-heavy axes:

AxisSourceReduced to
Large pasted inputBig user messagesSchema-and-samples for structured data; bounded text otherwise
CLI / shell outputshell and command tool resultsFailure lines plus a tail of the output
General tool outputAny large tool result (JSON, diffs, …)JSON → schema + samples; diffs → file/hunk/change summaries
Code-search outputcontent_search, semantic_code_searchGrouped file hits

For semantic code search, semantic_code_search uses Semble when installed, falls back to bounded local ripgrep, and finally to a built-in literal scan, so code search stays available in zero-install environments. See the Tools overview for the full tool catalog.

Two further reductions shrink the base context that every call carries:

  • Tool-schema compaction (compact_tool_schemas) trims native tool descriptions and JSON-schema metadata before each provider call, and the system prompt omits parameter schemas that are already supplied through the provider’s tool interface.
  • Terse internal outputs (terse_internal_outputs) apply a concise handoff contract to operator and sub-agent prompts. Set REVKA_TERSE_INTERNAL_OUTPUTS=0 to disable the Python operator side.

Other environment overrides for compression budgets: REVKA_AGENT_RESULT_MAX_CHARS (operator-side agent last_message budget), REVKA_WORKFLOW_SKILL_MAX_CHARS, and REVKA_WORKFLOW_SKILL_CONTEXT_MODE (pointer to send only krefs/paths, full to restore legacy full-inline skill context).

Distinct from per-turn compression, [agent.history_pruning] is a token-efficiency pass over the message list itself. It is disabled by default — turn it on for long-lived sessions or small-context models.

[agent.history_pruning]
enabled = false
max_tokens = 8192
keep_recent = 4
collapse_tool_results = true
KeyDefaultMeaning
enabledfalseEnable history pruning.
max_tokens8192Maximum estimated tokens for message history.
keep_recent4Keep the N most recent messages untouched.
collapse_tool_resultstrueCollapse old assistant tool-call / tool-result pairs into short summaries.

System messages and the keep_recent most recent messages are always protected. When enabled, pruning first collapses old tool-call/result pairs, then drops older messages until the estimated token total is under max_tokens.

The related keep_tool_context_turns (default 2, in the top-level [agent] table) controls how many recent turns keep their full tool-call and tool-result messages in channel history — older turns keep the conversational text but shed the verbose tool payloads.

When you connect many external MCP servers, sending every tool schema on every turn is expensive. tool_filter_groups limits which MCP tool schemas are sent to the LLM per turn. Built-in (non-MCP) tools always pass through unchanged, and when the list is empty the feature is inactive — all tools pass through (the backward-compatible default).

Each group is a table:

FieldTypePurpose
mode"always" | "dynamic"always: include the tool unconditionally. dynamic: include it only when the last user message contains a keyword.
tools[string]Tool name patterns. A single * wildcard is supported (prefix, suffix, or infix), e.g. "mcp_vikunja_*".
keywords[string]Dynamic mode only. Case-insensitive substrings matched against the last user message.
[agent]
# Vikunja task-management MCP tools are always available.
[[agent.tool_filter_groups]]
mode = "always"
tools = ["mcp_vikunja_*"]
# Browser MCP tools are only included when the user mentions browsing.
[[agent.tool_filter_groups]]
mode = "dynamic"
tools = ["mcp_browser_*"]
keywords = ["browse", "navigate", "open url", "screenshot"]
GoalChange
Longer autonomous tasksRaise max_tool_iterations; raise [pacing] message_timeout_scale_max.
Faster multi-tool turnsKeep parallel_tools = true (default).
Lower token cost on big outputsLower max_tool_result_chars and live_tool_result_max_chars; enable tool_filter_groups.
Small-context / local modelSet compact_context = true, enable [agent.history_pruning], set [agent.model_context_windows], and tune [pacing].
Tame a runaway loopLower max_tool_iterations; see emergency stop.

For slow or local LLM deployments (Ollama, llama.cpp, vLLM), pair these with the [pacing] controls — step timeouts and loop detection — described in Custom providers & local LLMs.