Skip to content

Routing, reliability & tuning

hint:-based routing, query classification, cost-optimized and embedding routes, fallback chains, retries, warmup, and per-request tuning.

Revka layers two optional wrappers on top of any base provider: a router that dispatches a single logical request to different (provider, model) pairs, and a reliability wrapper that adds retries, fallback chains, and key rotation. On top of that, a handful of per-request knobs let you tune timeouts, token caps, reasoning effort, and headers without touching the provider’s own settings.

Use this page when you want to: stabilize call sites against model deprecations, route different kinds of traffic to different models, fall over to a backup provider when one is down, or squeeze latency and cost out of a deployment. If you are still picking and configuring a single provider, start with the Provider quickstart and the Provider catalog.

The router lets you give a request a symbolic model name instead of a concrete one. A model parameter of hint:reasoning resolves to whichever (provider, model) you mapped to "reasoning" in config. The benefit: your call sites — channels, tools, agent steps — stay stable, and you upgrade models by editing one config entry.

Define routes as [[model_routes]] tables:

[[model_routes]]
hint = "reasoning"
provider = "openrouter"
model = "anthropic/claude-opus-4-5"
api_key = "" # optional per-route key override
[[model_routes]]
hint = "fast"
provider = "groq"
model = "llama-3.3-70b-versatile"

Then anywhere a model is accepted, pass the hint:

hint:reasoning
FieldRequiredMeaning
hintyesUnique symbolic name (the part after hint:).
provideryesA known provider ID.
modelyesThe model ID to use with that provider.
api_keynoPer-route key override.

This is the main reason to use hints. Keep hint:reasoning (and friends) hard-coded at every call site, and when a provider deprecates a model ID, change only the model = value in the route.

  1. Keep call sites stable: hint:reasoning, hint:fast, hint:semantic.

  2. Change only the target under [[model_routes]] (or [[embedding_routes]]).

  3. Validate the new config:

    Terminal window
    revka doctor
    revka status
  4. Smoke-test one representative flow (a chat plus a memory recall) before rolling out.

revka doctor validates that your model routes (and embedding routes) point at known providers, so a typo surfaces before it hits production.

Two special hints don’t look up a fixed route. Instead they score every route in the table by price and pick the cheapest qualifying one:

  • hint:cost-optimized
  • hint:cheapest

Scoring uses the [cost.prices] data (input + output cost per 1M tokens) keyed by model name, which lives under the [cost] section’s prices map. Candidates can be filtered by capability requirements (vision, native tool calling). If no pricing data is available, the router falls back to the default route.

[[model_routes]]
hint = "fast"
provider = "groq"
model = "llama-3.3-70b-versatile"
[[model_routes]]
hint = "reasoning"
provider = "openrouter"
model = "anthropic/claude-opus-4-5"
[cost.prices]
# add per-1M-token pricing so cost-optimized routing has data to score
# e.g. "groq/llama-3.3-70b-versatile" = { input = 0.59, output = 0.79 }

Query classification picks a hint: automatically based on the content of the incoming message — no LLM call is made to classify, it is pure string matching. Configure it under [query_classification]:

[query_classification]
enabled = true
[[query_classification.rules]]
hint = "reasoning"
keywords = ["explain", "analyze", "why"]
min_length = 200
priority = 10
[[query_classification.rules]]
hint = "fast"
keywords = ["hi", "hello", "thanks"]
max_length = 50
priority = 5
KeyDefaultMeaning
enabledfalseMaster switch for classification.
rules[]Rules, evaluated in priority order.

Each rule:

KeyDefaultMeaning
hintrequiredMust match a configured [[model_routes]] hint.
keywords[]Case-insensitive substring matches.
patterns[]Case-sensitive literal matches (e.g. code fences, "fn ").
min_lengthunsetOnly match when the message is at least N characters.
max_lengthunsetOnly match when the message is at most N characters.
priority0Higher priority is checked first.

This is a zero-cost way to send long, analytical questions to a strong reasoning model and short greetings to a cheap, fast one. Because matching is literal, patterns is ideal for routing code-heavy messages (match on a code fence or fn ) to a coding model.

[[embedding_routes]] entries can be defined in config and are validated by revka doctor, but embeddings have been removed from Revka — the routes are currently inert and are never used to perform any embedding at runtime. Only schema validation remains.

[[embedding_routes]]
hint = "semantic"
provider = "openai"
model = "text-embedding-3-small"
dimensions = 1536
[[embedding_routes]]
hint = "archive"
provider = "custom:https://embed.example.com/v1"
model = "your-embedding-model-id"
dimensions = 1024
api_key = "route-specific-key"
FieldMeaning
hintSymbolic name.
providerOne of none, openai, or custom:<url> (an OpenAI-compatible embeddings endpoint).
modelEmbedding model ID.
dimensionsOptional. Would override when the API default differs from your storage schema.
api_keyOptional per-route key override.

For the broader memory setup, see Kumiho graph memory and the Memory overview.

The agent can update routing config for you in natural language. Ask in chat — for example “Set conversation provider to kimi, model moonshot-v1-8k” — and the assistant calls the model_routing_config tool to persist the change to config.toml, no TOML editing required.

The tool can add or change hints, models, and providers in your route table. It cannot change security policies or delete data. This pairs well with the upgrade-safely pattern: keep call sites on stable hints and let the assistant retarget them.

Reliability: the ReliableProvider retry wrapper

Section titled “Reliability: the ReliableProvider retry wrapper”

The ReliableProvider wraps your primary provider — and an optional fallback chain — with a three-tier resilience strategy, applied in order:

  1. Model fallback chain — try each model in the configured chain for the request.
  2. Provider fallback chain — try each provider in fallback_providers after the primary exhausts its retries.
  3. Inner retry loop — exponential backoff per (provider, model) pair.

Errors are classified so retries only happen when they help:

ClassExamplesBehavior
Retryable5xx, timeoutsRetried with backoff.
Non-retryable4xx auth/validation (e.g. 400, 401)Fail fast — no retry.
Rate-limited429 with Retry-AfterRetried; Retry-After is honored (Gemini 429 bodies are parsed).
Business rate-limitedquota exhaustedNo retry.

Context-window errors trigger automatic history truncation (the oldest half of non-system messages is dropped) before retrying.

Configure it under [reliability]:

[reliability]
provider_retries = 2 # attempts per provider before fallover (default: 2)
provider_backoff_ms = 500 # base backoff ms; doubles each retry, capped at 10000
fallback_providers = ["anthropic", "openai"]
api_keys = ["sk-second-key", "sk-third-key"] # round-robin on 429
[reliability.model_fallbacks]
"claude-opus-4-20250514" = ["claude-sonnet-4-20250514", "gpt-4o"]
KeyDefaultMeaning
provider_retries2Attempts per provider before falling over.
provider_backoff_ms500Base backoff in ms; doubles each retry, capped at 10000 ms.
fallback_providersunsetOrdered provider names tried after the primary.
api_keysunsetExtra keys listed for round-robin rotation on 429s (rotation is currently a no-op at request time — the rotated key cannot be applied because the Provider trait has no set_api_key; retries continue with the original key).
model_fallbacksunsetMap of model → [fallback_model, …].

These two settings cover the most common reliability scenarios:

Keep a primary, fall over to a different vendor when it is unreachable:

default_provider = "anthropic"
default_model = "claude-sonnet-4-6-20250514"
[reliability]
fallback_providers = ["openai", "openrouter"]

Validate fallback providers before they matter — revka doctor checks that every name in fallback_providers is a known provider.

Per-request tuning with ProviderRuntimeOptions

Section titled “Per-request tuning with ProviderRuntimeOptions”

A set of runtime knobs is passed to every provider when it is built. Each maps to a config key (or environment variable) and applies to the provider’s outbound requests:

FieldConfig keyDefaultMeaning
API URL overrideapi_urlprovider defaultCustom base URL (proxies, self-hosted endpoints).
Reasoning on/off[runtime] reasoning_enabledunsetToggle extended thinking; env REVKA_REASONING_ENABLED.
Reasoning effort[runtime] reasoning_effortunsetminimal, low, medium, high, or xhigh.
HTTP timeoutprovider_timeout_secs120Per-call timeout in seconds.
Extra headersextra_headers (map){}Additional HTTP headers; env REVKA_EXTRA_HEADERS.
API pathapi_pathprovider defaultOverride the request path suffix for unusual APIs.
Max output tokensprovider_max_tokensprovider defaultCap output tokens.
provider_timeout_secs = 180
provider_max_tokens = 8192
api_path = "/v2/generate"
[runtime]
reasoning_enabled = true
reasoning_effort = "high"
[extra_headers]
"X-Org" = "research"

The REVKA_EXTRA_HEADERS environment variable uses a Key:Value,Key2:Value2 format:

Terminal window
export REVKA_EXTRA_HEADERS="X-Org:research,X-Trace:on"

Every provider exposes a warmup() method that pre-establishes the HTTP connection — DNS, the TLS handshake, and HTTP/2 setup — before the first real request. The ReliableProvider and the router call warmup() on all of their registered sub-providers sequentially at startup.

Warmup is purely a latency optimization: failures are non-fatal and logged at WARN. It mainly helps latency-sensitive deployments where the very first request would otherwise pay the full connection-setup cost.

Providers that support streaming emit StreamEvent values over an async channel: TextDelta, ToolCall, Usage, and Final, plus PreExecutedToolCall and PreExecutedToolResult for OpenAI-compatible custom proxies (e.g. claude-max-api-proxy / Claude Code proxy) that pre-execute tools and forward observability events via custom SSE fields. The gateway surfaces these as Server-Sent Events for the dashboard and API.

When wrappers are in play:

  • The ReliableProvider forwards the stream from the first qualifying provider.
  • The router delegates streaming to whichever provider its hint resolved to.

Streaming tool events are a separate capability (supports_streaming_tool_events). For a streamed tool-call request, a provider is used in streaming mode if it supports streaming and either: (a) it does not use native tool calling (tool calls are then parsed from text afterward), or (b) it also declares supports_streaming_tool_events. The streaming-tool-events capability is only required when native tool calling is in use. For the realtime transport, see Realtime: WebSocket, SSE & Live Canvas.

Several “thinking” models — DeepSeek-R1, GLM-4.7, Kimi K2.5 — return a reasoning_content field alongside the response text. Revka preserves it as an opaque blob in the chat response and re-sends it in subsequent multi-turn requests, because some provider APIs reject history that omits the field.

This is why, with those models, the full reasoning chain is preserved across turns in a conversation. It is handled transparently — you do not configure anything. Note that some providers take the opposite approach: Gemini’s thinking-model reasoning parts are filtered out of the final response rather than preserved.