Skip to content

Cost tracking & budgets

The token-usage ledger, daily/monthly budget enforcement modes, and pricing configuration.

Revka tracks the estimated USD cost of every LLM call, persists it to an append-only ledger, and can enforce daily and monthly spending limits before a request goes out. This page covers the token-usage ledger, the [cost] configuration section and its pricing table, the three enforcement modes (warn, block, route_down), and the gateway endpoints that read and write the ledger.

Reach for this page when you want to put a hard cap on what your agent can spend, get a warning before you blow through a budget, or feed sidecar/operator token usage into a single unified spend total. To view spend rather than configure it, see the read-only Cost dashboard page.

Every time the agent calls a model, Revka computes the call’s cost from the token counts and your configured per-model pricing, then writes a record to a JSONL ledger at <workspace>/state/costs.jsonl. The file is created automatically on first use. A legacy .revka/costs.db from older installs is migrated into the JSONL ledger on first start.

Cost is computed per call as:

cost_usd = (input_tokens / 1_000_000) * input_price
+ (output_tokens / 1_000_000) * output_price

Prices are USD per one million tokens, taken from the [cost.prices] table (below). If no pricing entry matches a model, the call is still recorded — with a cost of 0.0 and a debug log line — so token counts are never lost. Malformed lines in the ledger are skipped with a warning rather than aborting a read.

The tracker keeps three rolling windows:

  • Session — the current daemon process lifetime. The by_model, by_agent, and by_source breakdowns reflect this window only.
  • Daily — the rolling current day, recomputed from the full ledger.
  • Monthly — the calendar month, recomputed from the full ledger.

A single process-global tracker is shared by the gateway, the channels, and ingested sidecar usage, so every path checks and updates the same budget.

Cost tracking is configured entirely in ~/.revka/config.toml under [cost], plus its two sub-tables [cost.enforcement] and [cost.prices].

[cost]
enabled = true
daily_limit_usd = 10.00 # default 10.00
monthly_limit_usd = 100.00 # default 100.00
warn_at_percent = 80 # default 80 — warn at 80% of the limit
allow_override = false # default false
[cost.enforcement]
mode = "warn" # "warn" | "block" | "route_down"
route_down_model = "fast" # model hint to fall back to in route_down mode
reserve_percent = 10 # reserve 10% of the budget for critical ops
[cost.prices]
# USD per 1M tokens, { input, output }
"claude-sonnet-4-20250514" = { input = 3.0, output = 15.0 }
"gpt-4o" = { input = 2.5, output = 10.0 }
"gpt-4o-mini" = { input = 0.15, output = 0.60 }
KeyTypeDefaultMeaning
enabledbooltrueMaster switch. When false, no records are written, no budget is checked, and GET /api/cost returns a zeroed summary.
daily_limit_usdfloat10.00Daily spending ceiling in USD.
monthly_limit_usdfloat100.00Monthly (calendar-month) spending ceiling in USD.
warn_at_percentint (0–100)80Emit a warning once projected spend crosses this percentage of a limit.
allow_overrideboolfalsePermit the --override flag to bypass a hard limit.
KeyTypeDefaultMeaning
modestring"warn"What happens at the limit: warn, block, or route_down (see below).
route_down_modelstring?unsetThe model hint to fall back to when mode = "route_down" and the budget is exceeded.
reserve_percentint (0–100)10Reserve this percentage of the budget for critical operations.

Each entry maps a model key to its { input, output } price in USD per 1M tokens. Pricing lookup is fuzzy and tries, in order:

  1. An exact match on the model id (gpt-4o).
  2. A provider/model match (openai/gpt-4o).
  3. The suffix after the last / (so anthropic/claude-sonnet-4 matches a claude-sonnet-4 key).
  4. A prefix match after stripping a trailing numeric date segment.

This means you can usually key the table by the bare model name and have it match regardless of how the provider prefixes the id. A model with no matching entry records at zero cost.

Before each LLM call, the tracker projects current_spend + estimated_cost against the daily and monthly limits and returns one of three states: Allowed, Warning (a threshold crossed but the call proceeds), or Exceeded (the limit would be breached). Both the warning and the exceeded checks compare against the projected total, not just the current spend. The mode decides what happens when a limit is exceeded.

mode = "warn" (the default) never blocks. When projected spend crosses warn_at_percent of a limit, a warning is logged and the request proceeds. When a hard limit would be exceeded, it is still logged — but the call still goes out. Use this to observe spend and tune limits before you start enforcing them.

[cost.enforcement]
mode = "warn"

The gateway exposes two cost endpoints — one to read the summary, one for sidecars to ingest usage.

GET /api/cost

No auth required — this is read-only operator telemetry. It returns a CostSummary:

{
"cost": {
"session_cost_usd": 0.4213,
"daily_cost_usd": 1.8740,
"monthly_cost_usd": 22.5106,
"total_tokens": 1542300,
"request_count": 318,
"by_model": {
"anthropic/claude-sonnet-4-6": {
"model": "anthropic/claude-sonnet-4-6",
"cost_usd": 0.4013,
"total_tokens": 1410200,
"request_count": 286
}
},
"by_agent": {},
"by_source": {
"gateway": { "source": "gateway", "cost_usd": 0.30, "total_tokens": 980000, "request_count": 210 }
},
"budget": {
"enabled": true,
"daily_limit_usd": 10.0,
"monthly_limit_usd": 100.0,
"warn_at_percent": 80,
"daily_remaining_usd": 8.126,
"monthly_remaining_usd": 77.489,
"daily_percent": 18.74,
"monthly_percent": 22.51,
"state": "ok"
}
}
}
FieldMeaning
session_cost_usd / daily_cost_usd / monthly_cost_usdSpend in each rolling window.
total_tokens / request_countSession-window token and call totals.
by_modelPer-model ModelStats (cost, tokens, request count) for the session.
by_agentPer sidecar/operator agent_id breakdown, including a nested by_model.
by_sourcePer-origin breakdown keyed by gateway, channel, sidecar, or runtime (untagged records).
budgetThe BudgetStatus: limits, remaining, utilization percent, and state (ok / warning / exceeded / disabled).

When tracking is disabled, the endpoint returns a zeroed summary with budget.state = "disabled" rather than an error.

External sidecars (the Operator MCP, agent sub-processes) push their token usage into the same global ledger so spend is unified across the whole system:

POST /api/cost/usage
X-Revka-Service-Token: <service_token>
Content-Type: application/json
{
"model": "gpt-4o",
"provider": "openai",
"input_tokens": 1000,
"output_tokens": 250,
"source": "sidecar",
"agent_id": "my-agent",
"agent_title": "My Agent"
}

Only model is required; provider, input_tokens, output_tokens, source, agent_id, and agent_title are optional. The cost is computed from [cost.prices] exactly as for an internal call. On success:

{ "recorded": true, "usage": { "model": "gpt-4o", "input_tokens": 1000, "output_tokens": 250, "total_tokens": 1250, "cost_usd": 0.005, "timestamp": "2026-06-18T09:00:00Z" } }

When cost tracking is disabled, it returns { "recorded": false, "reason": "cost tracking disabled" }.

The Operator MCP reads the same [cost] configuration and exposes budget visibility to multi-agent workflows:

  • get_budget_status() — returns session/daily/monthly spend against your configured limits, with a per-model breakdown. The same numbers the gateway endpoint reports.
  • system_dashboard(include_costs=...) — a single-call snapshot combining cost, agent, workflow, and health views.
  • Workflow max_cost_usd — a per-run cost cap on a declarative workflow. The run aborts if its cost exceeds the cap, independent of the global daily/monthly budget. See Workflows & SOP overview.

Use max_cost_usd to bound an individual fan-out or refinement loop, and [cost] limits to bound total spend across everything.

  1. Add pricing and a limit. Populate [cost.prices] for the models you route to and set a low daily_limit_usd to test enforcement quickly.

  2. Run a few requests, then read the ledger summary:

    Terminal window
    curl http://127.0.0.1:42617/api/cost
  3. Confirm the numbers move. daily_cost_usd and budget.daily_percent should climb. If they stay at 0 after real calls, your [cost.prices] keys are not matching the model ids in use — check the fuzzy-match rules above.

  4. Test enforcement. Set mode = "block" with a tiny limit and confirm requests are rejected once the limit is projected to be exceeded.