Cost tracking & budgets
The token-usage ledger, daily/monthly budget enforcement modes, and pricing configuration.
Revka tracks the estimated USD cost of every LLM call, persists it to an append-only ledger, and can enforce daily and monthly spending limits before a request goes out. This page covers the token-usage ledger, the [cost] configuration section and its pricing table, the three enforcement modes (warn, block, route_down), and the gateway endpoints that read and write the ledger.
Reach for this page when you want to put a hard cap on what your agent can spend, get a warning before you blow through a budget, or feed sidecar/operator token usage into a single unified spend total. To view spend rather than configure it, see the read-only Cost dashboard page.
How cost tracking works
Section titled “How cost tracking works”Every time the agent calls a model, Revka computes the call’s cost from the token counts and your configured per-model pricing, then writes a record to a JSONL ledger at <workspace>/state/costs.jsonl. The file is created automatically on first use. A legacy .revka/costs.db from older installs is migrated into the JSONL ledger on first start.
Cost is computed per call as:
cost_usd = (input_tokens / 1_000_000) * input_price + (output_tokens / 1_000_000) * output_pricePrices are USD per one million tokens, taken from the [cost.prices] table (below). If no pricing entry matches a model, the call is still recorded — with a cost of 0.0 and a debug log line — so token counts are never lost. Malformed lines in the ledger are skipped with a warning rather than aborting a read.
The tracker keeps three rolling windows:
- Session — the current daemon process lifetime. The
by_model,by_agent, andby_sourcebreakdowns reflect this window only. - Daily — the rolling current day, recomputed from the full ledger.
- Monthly — the calendar month, recomputed from the full ledger.
A single process-global tracker is shared by the gateway, the channels, and ingested sidecar usage, so every path checks and updates the same budget.
Configure the [cost] section
Section titled “Configure the [cost] section”Cost tracking is configured entirely in ~/.revka/config.toml under [cost], plus its two sub-tables [cost.enforcement] and [cost.prices].
[cost]enabled = truedaily_limit_usd = 10.00 # default 10.00monthly_limit_usd = 100.00 # default 100.00warn_at_percent = 80 # default 80 — warn at 80% of the limitallow_override = false # default false
[cost.enforcement]mode = "warn" # "warn" | "block" | "route_down"route_down_model = "fast" # model hint to fall back to in route_down modereserve_percent = 10 # reserve 10% of the budget for critical ops
[cost.prices]# USD per 1M tokens, { input, output }"claude-sonnet-4-20250514" = { input = 3.0, output = 15.0 }"gpt-4o" = { input = 2.5, output = 10.0 }"gpt-4o-mini" = { input = 0.15, output = 0.60 }[cost] keys
Section titled “[cost] keys”| Key | Type | Default | Meaning |
|---|---|---|---|
enabled | bool | true | Master switch. When false, no records are written, no budget is checked, and GET /api/cost returns a zeroed summary. |
daily_limit_usd | float | 10.00 | Daily spending ceiling in USD. |
monthly_limit_usd | float | 100.00 | Monthly (calendar-month) spending ceiling in USD. |
warn_at_percent | int (0–100) | 80 | Emit a warning once projected spend crosses this percentage of a limit. |
allow_override | bool | false | Permit the --override flag to bypass a hard limit. |
[cost.enforcement] keys
Section titled “[cost.enforcement] keys”| Key | Type | Default | Meaning |
|---|---|---|---|
mode | string | "warn" | What happens at the limit: warn, block, or route_down (see below). |
route_down_model | string? | unset | The model hint to fall back to when mode = "route_down" and the budget is exceeded. |
reserve_percent | int (0–100) | 10 | Reserve this percentage of the budget for critical operations. |
[cost.prices] — the pricing table
Section titled “[cost.prices] — the pricing table”Each entry maps a model key to its { input, output } price in USD per 1M tokens. Pricing lookup is fuzzy and tries, in order:
- An exact match on the model id (
gpt-4o). - A
provider/modelmatch (openai/gpt-4o). - The suffix after the last
/(soanthropic/claude-sonnet-4matches aclaude-sonnet-4key). - A prefix match after stripping a trailing numeric date segment.
This means you can usually key the table by the bare model name and have it match regardless of how the provider prefixes the id. A model with no matching entry records at zero cost.
Enforcement modes
Section titled “Enforcement modes”Before each LLM call, the tracker projects current_spend + estimated_cost against the daily and monthly limits and returns one of three states: Allowed, Warning (a threshold crossed but the call proceeds), or Exceeded (the limit would be breached). Both the warning and the exceeded checks compare against the projected total, not just the current spend. The mode decides what happens when a limit is exceeded.
mode = "warn" (the default) never blocks. When projected spend crosses warn_at_percent of a limit, a warning is logged and the request proceeds. When a hard limit would be exceeded, it is still logged — but the call still goes out. Use this to observe spend and tune limits before you start enforcing them.
[cost.enforcement]mode = "warn"mode = "block" rejects any request that would push spend over the daily or monthly limit. The agent gets a budget-exceeded error instead of making the call. This is the strict guardrail — nothing runs once you are out of budget.
[cost.enforcement]mode = "block"Pair with allow_override = true if you want an operator escape hatch via the --override flag for one-off critical work.
mode = "route_down" keeps the agent running but downgrades it: when the budget is exceeded, the call is switched to the cheaper model named by route_down_model instead of being blocked. This trades quality for continuity — your agent stays responsive on a budget model rather than going dark.
[cost.enforcement]mode = "route_down"route_down_model = "fast" # a model hint defined in [[model_routes]]Define the target as a model route hint so the fallback resolves to a concrete provider + model.
Gateway cost endpoint
Section titled “Gateway cost endpoint”The gateway exposes two cost endpoints — one to read the summary, one for sidecars to ingest usage.
Read the cost summary
Section titled “Read the cost summary”GET /api/costNo auth required — this is read-only operator telemetry. It returns a CostSummary:
{ "cost": { "session_cost_usd": 0.4213, "daily_cost_usd": 1.8740, "monthly_cost_usd": 22.5106, "total_tokens": 1542300, "request_count": 318, "by_model": { "anthropic/claude-sonnet-4-6": { "model": "anthropic/claude-sonnet-4-6", "cost_usd": 0.4013, "total_tokens": 1410200, "request_count": 286 } }, "by_agent": {}, "by_source": { "gateway": { "source": "gateway", "cost_usd": 0.30, "total_tokens": 980000, "request_count": 210 } }, "budget": { "enabled": true, "daily_limit_usd": 10.0, "monthly_limit_usd": 100.0, "warn_at_percent": 80, "daily_remaining_usd": 8.126, "monthly_remaining_usd": 77.489, "daily_percent": 18.74, "monthly_percent": 22.51, "state": "ok" } }}| Field | Meaning |
|---|---|
session_cost_usd / daily_cost_usd / monthly_cost_usd | Spend in each rolling window. |
total_tokens / request_count | Session-window token and call totals. |
by_model | Per-model ModelStats (cost, tokens, request count) for the session. |
by_agent | Per sidecar/operator agent_id breakdown, including a nested by_model. |
by_source | Per-origin breakdown keyed by gateway, channel, sidecar, or runtime (untagged records). |
budget | The BudgetStatus: limits, remaining, utilization percent, and state (ok / warning / exceeded / disabled). |
When tracking is disabled, the endpoint returns a zeroed summary with budget.state = "disabled" rather than an error.
Ingest sidecar usage
Section titled “Ingest sidecar usage”External sidecars (the Operator MCP, agent sub-processes) push their token usage into the same global ledger so spend is unified across the whole system:
POST /api/cost/usageX-Revka-Service-Token: <service_token>Content-Type: application/json
{ "model": "gpt-4o", "provider": "openai", "input_tokens": 1000, "output_tokens": 250, "source": "sidecar", "agent_id": "my-agent", "agent_title": "My Agent"}Only model is required; provider, input_tokens, output_tokens, source, agent_id, and agent_title are optional. The cost is computed from [cost.prices] exactly as for an internal call. On success:
{ "recorded": true, "usage": { "model": "gpt-4o", "input_tokens": 1000, "output_tokens": 250, "total_tokens": 1250, "cost_usd": 0.005, "timestamp": "2026-06-18T09:00:00Z" } }When cost tracking is disabled, it returns { "recorded": false, "reason": "cost tracking disabled" }.
Operator-side budget tools
Section titled “Operator-side budget tools”The Operator MCP reads the same [cost] configuration and exposes budget visibility to multi-agent workflows:
get_budget_status()— returns session/daily/monthly spend against your configured limits, with a per-model breakdown. The same numbers the gateway endpoint reports.system_dashboard(include_costs=...)— a single-call snapshot combining cost, agent, workflow, and health views.- Workflow
max_cost_usd— a per-run cost cap on a declarative workflow. The run aborts if its cost exceeds the cap, independent of the global daily/monthly budget. See Workflows & SOP overview.
Use max_cost_usd to bound an individual fan-out or refinement loop, and [cost] limits to bound total spend across everything.
Verify your setup
Section titled “Verify your setup”-
Add pricing and a limit. Populate
[cost.prices]for the models you route to and set a lowdaily_limit_usdto test enforcement quickly. -
Run a few requests, then read the ledger summary:
Terminal window curl http://127.0.0.1:42617/api/cost -
Confirm the numbers move.
daily_cost_usdandbudget.daily_percentshould climb. If they stay at0after real calls, your[cost.prices]keys are not matching the model ids in use — check the fuzzy-match rules above. -
Test enforcement. Set
mode = "block"with a tiny limit and confirm requests are rejected once the limit is projected to be exceeded.