Cost tracking & budgets

The token-usage ledger, daily/monthly budget enforcement modes, and pricing configuration.

Revka tracks the estimated USD cost of every LLM call, persists it to an append-only ledger, and can enforce daily and monthly spending limits before a request goes out. This page covers the token-usage ledger, the [cost] configuration section and its pricing table, the three enforcement modes (warn, block, route_down), and the gateway endpoints that read and write the ledger.

Reach for this page when you want to put a hard cap on what your agent can spend, get a warning before you blow through a budget, or feed sidecar/operator token usage into a single unified spend total. To view spend rather than configure it, see the read-only Cost dashboard page.

How cost tracking works

Every time the agent calls a model, Revka computes the call’s cost from the token counts and your configured per-model pricing, then writes a record to a JSONL ledger at <workspace>/state/costs.jsonl. The file is created automatically on first use. A legacy .revka/costs.db from older installs is migrated into the JSONL ledger on first start.

Cost is computed per call as:

cost_usd = (input_tokens / 1_000_000) * input_price
         + (output_tokens / 1_000_000) * output_price

Prices are USD per one million tokens, taken from the [cost.prices] table (below). If no pricing entry matches a model, the call is still recorded — with a cost of 0.0 and a debug log line — so token counts are never lost. Malformed lines in the ledger are skipped with a warning rather than aborting a read.

The tracker keeps three rolling windows:

Session — the current daemon process lifetime. The by_model, by_agent, and by_source breakdowns reflect this window only.
Daily — the rolling current day, recomputed from the full ledger.
Monthly — the calendar month, recomputed from the full ledger.

A single process-global tracker is shared by the gateway, the channels, and ingested sidecar usage, so every path checks and updates the same budget.

Configure the `[cost]` section

Cost tracking is configured entirely in ~/.revka/config.toml under [cost], plus its two sub-tables [cost.enforcement] and [cost.prices].

[cost]
enabled = true
daily_limit_usd = 10.00      # default 10.00
monthly_limit_usd = 100.00   # default 100.00
warn_at_percent = 80         # default 80 — warn at 80% of the limit
allow_override = false       # default false

[cost.enforcement]
mode = "warn"                # "warn" | "block" | "route_down"
route_down_model = "fast"    # model hint to fall back to in route_down mode
reserve_percent = 10         # reserve 10% of the budget for critical ops

[cost.prices]
# USD per 1M tokens, { input, output }
"claude-sonnet-4-20250514" = { input = 3.0, output = 15.0 }
"gpt-4o" = { input = 2.5, output = 10.0 }
"gpt-4o-mini" = { input = 0.15, output = 0.60 }

`[cost]` keys

Key	Type	Default	Meaning
`enabled`	bool	`true`	Master switch. When `false`, no records are written, no budget is checked, and `GET /api/cost` returns a zeroed summary.
`daily_limit_usd`	float	`10.00`	Daily spending ceiling in USD.
`monthly_limit_usd`	float	`100.00`	Monthly (calendar-month) spending ceiling in USD.
`warn_at_percent`	int (0–100)	`80`	Emit a warning once projected spend crosses this percentage of a limit.
`allow_override`	bool	`false`	Permit the `--override` flag to bypass a hard limit.

`[cost.enforcement]` keys

Key	Type	Default	Meaning
`mode`	string	`"warn"`	What happens at the limit: `warn`, `block`, or `route_down` (see below).
`route_down_model`	string?	unset	The model hint to fall back to when `mode = "route_down"` and the budget is exceeded.
`reserve_percent`	int (0–100)	`10`	Reserve this percentage of the budget for critical operations.

`[cost.prices]` — the pricing table

Each entry maps a model key to its { input, output } price in USD per 1M tokens. Pricing lookup is fuzzy and tries, in order:

An exact match on the model id (gpt-4o).
A provider/model match (openai/gpt-4o).
The suffix after the last / (so anthropic/claude-sonnet-4 matches a claude-sonnet-4 key).
A prefix match after stripping a trailing numeric date segment.

This means you can usually key the table by the bare model name and have it match regardless of how the provider prefixes the id. A model with no matching entry records at zero cost.

Enforcement modes

Before each LLM call, the tracker projects current_spend + estimated_cost against the daily and monthly limits and returns one of three states: Allowed, Warning (a threshold crossed but the call proceeds), or Exceeded (the limit would be breached). Both the warning and the exceeded checks compare against the projected total, not just the current spend. The mode decides what happens when a limit is exceeded.

mode = "warn" (the default) never blocks. When projected spend crosses warn_at_percent of a limit, a warning is logged and the request proceeds. When a hard limit would be exceeded, it is still logged — but the call still goes out. Use this to observe spend and tune limits before you start enforcing them.

[cost.enforcement]
mode = "warn"

mode = "block" rejects any request that would push spend over the daily or monthly limit. The agent gets a budget-exceeded error instead of making the call. This is the strict guardrail — nothing runs once you are out of budget.

[cost.enforcement]
mode = "block"

Pair with allow_override = true if you want an operator escape hatch via the --override flag for one-off critical work.

mode = "route_down" keeps the agent running but downgrades it: when the budget is exceeded, the call is switched to the cheaper model named by route_down_model instead of being blocked. This trades quality for continuity — your agent stays responsive on a budget model rather than going dark.

[cost.enforcement]
mode = "route_down"
route_down_model = "fast"   # a model hint defined in [[model_routes]]

Define the target as a model route hint so the fallback resolves to a concrete provider + model.

Gateway cost endpoint

The gateway exposes two cost endpoints — one to read the summary, one for sidecars to ingest usage.

Read the cost summary

GET /api/cost

No auth required — this is read-only operator telemetry. It returns a CostSummary:

{
  "cost": {
    "session_cost_usd": 0.4213,
    "daily_cost_usd": 1.8740,
    "monthly_cost_usd": 22.5106,
    "total_tokens": 1542300,
    "request_count": 318,
    "by_model": {
      "anthropic/claude-sonnet-4-6": {
        "model": "anthropic/claude-sonnet-4-6",
        "cost_usd": 0.4013,
        "total_tokens": 1410200,
        "request_count": 286
      }
    },
    "by_agent": {},
    "by_source": {
      "gateway": { "source": "gateway", "cost_usd": 0.30, "total_tokens": 980000, "request_count": 210 }
    },
    "budget": {
      "enabled": true,
      "daily_limit_usd": 10.0,
      "monthly_limit_usd": 100.0,
      "warn_at_percent": 80,
      "daily_remaining_usd": 8.126,
      "monthly_remaining_usd": 77.489,
      "daily_percent": 18.74,
      "monthly_percent": 22.51,
      "state": "ok"
    }
  }
}

Field	Meaning
`session_cost_usd` / `daily_cost_usd` / `monthly_cost_usd`	Spend in each rolling window.
`total_tokens` / `request_count`	Session-window token and call totals.
`by_model`	Per-model `ModelStats` (cost, tokens, request count) for the session.
`by_agent`	Per sidecar/operator `agent_id` breakdown, including a nested `by_model`.
`by_source`	Per-origin breakdown keyed by `gateway`, `channel`, `sidecar`, or `runtime` (untagged records).
`budget`	The `BudgetStatus`: limits, remaining, utilization percent, and `state` (`ok` / `warning` / `exceeded` / `disabled`).

When tracking is disabled, the endpoint returns a zeroed summary with budget.state = "disabled" rather than an error.

Ingest sidecar usage

External sidecars (the Operator MCP, agent sub-processes) push their token usage into the same global ledger so spend is unified across the whole system:

POST /api/cost/usage
X-Revka-Service-Token: <service_token>
Content-Type: application/json

{
  "model": "gpt-4o",
  "provider": "openai",
  "input_tokens": 1000,
  "output_tokens": 250,
  "source": "sidecar",
  "agent_id": "my-agent",
  "agent_title": "My Agent"
}

Only model is required; provider, input_tokens, output_tokens, source, agent_id, and agent_title are optional. The cost is computed from [cost.prices] exactly as for an internal call. On success:

{ "recorded": true, "usage": { "model": "gpt-4o", "input_tokens": 1000, "output_tokens": 250, "total_tokens": 1250, "cost_usd": 0.005, "timestamp": "2026-06-18T09:00:00Z" } }

When cost tracking is disabled, it returns { "recorded": false, "reason": "cost tracking disabled" }.

Operator-side budget tools

The Operator MCP reads the same [cost] configuration and exposes budget visibility to multi-agent workflows:

get_budget_status() — returns session/daily/monthly spend against your configured limits, with a per-model breakdown. The same numbers the gateway endpoint reports.
system_dashboard(include_costs=...) — a single-call snapshot combining cost, agent, workflow, and health views.
Workflow max_cost_usd — a per-run cost cap on a declarative workflow. The run aborts if its cost exceeds the cap, independent of the global daily/monthly budget. See Workflows & SOP overview.

Use max_cost_usd to bound an individual fan-out or refinement loop, and [cost] limits to bound total spend across everything.

Verify your setup

Add pricing and a limit. Populate [cost.prices] for the models you route to and set a low daily_limit_usd to test enforcement quickly.
Run a few requests, then read the ledger summary:
Terminal window
```
curl http://127.0.0.1:42617/api/cost
```
Confirm the numbers move. daily_cost_usd and budget.daily_percent should climb. If they stay at 0 after real calls, your [cost.prices] keys are not matching the model ids in use — check the fuzzy-match rules above.
Test enforcement. Set mode = "block" with a tiny limit and confirm requests are rejected once the limit is projected to be exceeded.

Cron, cost & config pages The read-only /cost dashboard with per-model bar charts and spend windows.

Cost, audit, ClawHub & credentials API The /api/cost endpoints in the full Gateway API reference.

Observability & tracing Prometheus token metrics, OTel cost attributes, and the observer pipeline.

Routing, reliability & tuning Define the model-route hint that route_down mode falls back to.

Configuration overview Every config.toml section and key, including [cost].

Spawning & coordinating agents The Operator MCP budget tools and workflow max_cost_usd caps.

Cost tracking & budgets

How cost tracking works

Configure the `[cost]` section

`[cost]` keys

`[cost.enforcement]` keys

`[cost.prices]` — the pricing table

Enforcement modes

Gateway cost endpoint

Read the cost summary

Ingest sidecar usage

Operator-side budget tools

Verify your setup

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Cost tracking & budgets

How cost tracking works

Configure the [cost] section

[cost] keys

[cost.enforcement] keys

[cost.prices] — the pricing table

Enforcement modes

Gateway cost endpoint

Read the cost summary

Ingest sidecar usage

Operator-side budget tools

Verify your setup

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Configure the `[cost]` section

`[cost]` keys

`[cost.enforcement]` keys

`[cost.prices]` — the pricing table