Routing, reliability & tuning
hint:-based routing, query classification, cost-optimized and embedding routes, fallback chains, retries, warmup, and per-request tuning.
Revka layers two optional wrappers on top of any base provider: a router that dispatches a single logical request to different (provider, model) pairs, and a reliability wrapper that adds retries, fallback chains, and key rotation. On top of that, a handful of per-request knobs let you tune timeouts, token caps, reasoning effort, and headers without touching the provider’s own settings.
Use this page when you want to: stabilize call sites against model deprecations, route different kinds of traffic to different models, fall over to a backup provider when one is down, or squeeze latency and cost out of a deployment. If you are still picking and configuring a single provider, start with the Provider quickstart and the Provider catalog.
Model routing with hint: prefixes
Section titled “Model routing with hint: prefixes”The router lets you give a request a symbolic model name instead of a concrete one. A model parameter of hint:reasoning resolves to whichever (provider, model) you mapped to "reasoning" in config. The benefit: your call sites — channels, tools, agent steps — stay stable, and you upgrade models by editing one config entry.
Define routes as [[model_routes]] tables:
[[model_routes]]hint = "reasoning"provider = "openrouter"model = "anthropic/claude-opus-4-5"api_key = "" # optional per-route key override
[[model_routes]]hint = "fast"provider = "groq"model = "llama-3.3-70b-versatile"Then anywhere a model is accepted, pass the hint:
hint:reasoning| Field | Required | Meaning |
|---|---|---|
hint | yes | Unique symbolic name (the part after hint:). |
provider | yes | A known provider ID. |
model | yes | The model ID to use with that provider. |
api_key | no | Per-route key override. |
Upgrade models safely
Section titled “Upgrade models safely”This is the main reason to use hints. Keep hint:reasoning (and friends) hard-coded at every call site, and when a provider deprecates a model ID, change only the model = value in the route.
-
Keep call sites stable:
hint:reasoning,hint:fast,hint:semantic. -
Change only the target under
[[model_routes]](or[[embedding_routes]]). -
Validate the new config:
Terminal window revka doctorrevka status -
Smoke-test one representative flow (a chat plus a memory recall) before rolling out.
revka doctor validates that your model routes (and embedding routes) point at known providers, so a typo surfaces before it hits production.
Cost-optimized routing
Section titled “Cost-optimized routing”Two special hints don’t look up a fixed route. Instead they score every route in the table by price and pick the cheapest qualifying one:
hint:cost-optimizedhint:cheapest
Scoring uses the [cost.prices] data (input + output cost per 1M tokens) keyed by model name, which lives under the [cost] section’s prices map. Candidates can be filtered by capability requirements (vision, native tool calling). If no pricing data is available, the router falls back to the default route.
[[model_routes]]hint = "fast"provider = "groq"model = "llama-3.3-70b-versatile"
[[model_routes]]hint = "reasoning"provider = "openrouter"model = "anthropic/claude-opus-4-5"
[cost.prices]# add per-1M-token pricing so cost-optimized routing has data to score# e.g. "groq/llama-3.3-70b-versatile" = { input = 0.59, output = 0.79 }Query classification (automatic routing)
Section titled “Query classification (automatic routing)”Query classification picks a hint: automatically based on the content of the incoming message — no LLM call is made to classify, it is pure string matching. Configure it under [query_classification]:
[query_classification]enabled = true
[[query_classification.rules]]hint = "reasoning"keywords = ["explain", "analyze", "why"]min_length = 200priority = 10
[[query_classification.rules]]hint = "fast"keywords = ["hi", "hello", "thanks"]max_length = 50priority = 5| Key | Default | Meaning |
|---|---|---|
enabled | false | Master switch for classification. |
rules | [] | Rules, evaluated in priority order. |
Each rule:
| Key | Default | Meaning |
|---|---|---|
hint | required | Must match a configured [[model_routes]] hint. |
keywords | [] | Case-insensitive substring matches. |
patterns | [] | Case-sensitive literal matches (e.g. code fences, "fn "). |
min_length | unset | Only match when the message is at least N characters. |
max_length | unset | Only match when the message is at most N characters. |
priority | 0 | Higher priority is checked first. |
This is a zero-cost way to send long, analytical questions to a strong reasoning model and short greetings to a cheap, fast one. Because matching is literal, patterns is ideal for routing code-heavy messages (match on a code fence or fn ) to a coding model.
Embedding routing
Section titled “Embedding routing”[[embedding_routes]] entries can be defined in config and are validated by revka doctor, but embeddings have been removed from Revka — the routes are currently inert and are never used to perform any embedding at runtime. Only schema validation remains.
[[embedding_routes]]hint = "semantic"provider = "openai"model = "text-embedding-3-small"dimensions = 1536
[[embedding_routes]]hint = "archive"provider = "custom:https://embed.example.com/v1"model = "your-embedding-model-id"dimensions = 1024api_key = "route-specific-key"| Field | Meaning |
|---|---|
hint | Symbolic name. |
provider | One of none, openai, or custom:<url> (an OpenAI-compatible embeddings endpoint). |
model | Embedding model ID. |
dimensions | Optional. Would override when the API default differs from your storage schema. |
api_key | Optional per-route key override. |
For the broader memory setup, see Kumiho graph memory and the Memory overview.
The model_routing_config tool
Section titled “The model_routing_config tool”The agent can update routing config for you in natural language. Ask in chat — for example “Set conversation provider to kimi, model moonshot-v1-8k” — and the assistant calls the model_routing_config tool to persist the change to config.toml, no TOML editing required.
The tool can add or change hints, models, and providers in your route table. It cannot change security policies or delete data. This pairs well with the upgrade-safely pattern: keep call sites on stable hints and let the assistant retarget them.
Reliability: the ReliableProvider retry wrapper
Section titled “Reliability: the ReliableProvider retry wrapper”The ReliableProvider wraps your primary provider — and an optional fallback chain — with a three-tier resilience strategy, applied in order:
- Model fallback chain — try each model in the configured chain for the request.
- Provider fallback chain — try each provider in
fallback_providersafter the primary exhausts its retries. - Inner retry loop — exponential backoff per
(provider, model)pair.
Errors are classified so retries only happen when they help:
| Class | Examples | Behavior |
|---|---|---|
| Retryable | 5xx, timeouts | Retried with backoff. |
| Non-retryable | 4xx auth/validation (e.g. 400, 401) | Fail fast — no retry. |
| Rate-limited | 429 with Retry-After | Retried; Retry-After is honored (Gemini 429 bodies are parsed). |
| Business rate-limited | quota exhausted | No retry. |
Context-window errors trigger automatic history truncation (the oldest half of non-system messages is dropped) before retrying.
Configure it under [reliability]:
[reliability]provider_retries = 2 # attempts per provider before fallover (default: 2)provider_backoff_ms = 500 # base backoff ms; doubles each retry, capped at 10000fallback_providers = ["anthropic", "openai"]api_keys = ["sk-second-key", "sk-third-key"] # round-robin on 429
[reliability.model_fallbacks]"claude-opus-4-20250514" = ["claude-sonnet-4-20250514", "gpt-4o"]| Key | Default | Meaning |
|---|---|---|
provider_retries | 2 | Attempts per provider before falling over. |
provider_backoff_ms | 500 | Base backoff in ms; doubles each retry, capped at 10000 ms. |
fallback_providers | unset | Ordered provider names tried after the primary. |
api_keys | unset | Extra keys listed for round-robin rotation on 429s (rotation is currently a no-op at request time — the rotated key cannot be applied because the Provider trait has no set_api_key; retries continue with the original key). |
model_fallbacks | unset | Map of model → [fallback_model, …]. |
fallback_providers and model_fallbacks
Section titled “fallback_providers and model_fallbacks”These two settings cover the most common reliability scenarios:
Keep a primary, fall over to a different vendor when it is unreachable:
default_provider = "anthropic"default_model = "claude-sonnet-4-6-20250514"
[reliability]fallback_providers = ["openai", "openrouter"]Map a soon-to-be-retired model to live replacements so requests keep working during the transition:
[reliability.model_fallbacks]"claude-opus-4-20250514" = ["claude-sonnet-4-6-20250514", "gpt-4o"]api_keys lists additional keys for attempted round-robin rotation on 429s. Note: key rotation is currently a no-op at request time — the Provider trait has no set_api_key, so retries continue with the original key regardless of rotation:
[reliability]api_keys = ["sk-key-a", "sk-key-b", "sk-key-c"]Validate fallback providers before they matter — revka doctor checks that every name in fallback_providers is a known provider.
Per-request tuning with ProviderRuntimeOptions
Section titled “Per-request tuning with ProviderRuntimeOptions”A set of runtime knobs is passed to every provider when it is built. Each maps to a config key (or environment variable) and applies to the provider’s outbound requests:
| Field | Config key | Default | Meaning |
|---|---|---|---|
| API URL override | api_url | provider default | Custom base URL (proxies, self-hosted endpoints). |
| Reasoning on/off | [runtime] reasoning_enabled | unset | Toggle extended thinking; env REVKA_REASONING_ENABLED. |
| Reasoning effort | [runtime] reasoning_effort | unset | minimal, low, medium, high, or xhigh. |
| HTTP timeout | provider_timeout_secs | 120 | Per-call timeout in seconds. |
| Extra headers | extra_headers (map) | {} | Additional HTTP headers; env REVKA_EXTRA_HEADERS. |
| API path | api_path | provider default | Override the request path suffix for unusual APIs. |
| Max output tokens | provider_max_tokens | provider default | Cap output tokens. |
provider_timeout_secs = 180provider_max_tokens = 8192api_path = "/v2/generate"
[runtime]reasoning_enabled = truereasoning_effort = "high"
[extra_headers]"X-Org" = "research"The REVKA_EXTRA_HEADERS environment variable uses a Key:Value,Key2:Value2 format:
export REVKA_EXTRA_HEADERS="X-Org:research,X-Trace:on"Provider warmup
Section titled “Provider warmup”Every provider exposes a warmup() method that pre-establishes the HTTP connection — DNS, the TLS handshake, and HTTP/2 setup — before the first real request. The ReliableProvider and the router call warmup() on all of their registered sub-providers sequentially at startup.
Warmup is purely a latency optimization: failures are non-fatal and logged at WARN. It mainly helps latency-sensitive deployments where the very first request would otherwise pay the full connection-setup cost.
Streaming support
Section titled “Streaming support”Providers that support streaming emit StreamEvent values over an async channel: TextDelta, ToolCall, Usage, and Final, plus PreExecutedToolCall and PreExecutedToolResult for OpenAI-compatible custom proxies (e.g. claude-max-api-proxy / Claude Code proxy) that pre-execute tools and forward observability events via custom SSE fields. The gateway surfaces these as Server-Sent Events for the dashboard and API.
When wrappers are in play:
- The
ReliableProviderforwards the stream from the first qualifying provider. - The router delegates streaming to whichever provider its hint resolved to.
Streaming tool events are a separate capability (supports_streaming_tool_events). For a streamed tool-call request, a provider is used in streaming mode if it supports streaming and either: (a) it does not use native tool calling (tool calls are then parsed from text afterward), or (b) it also declares supports_streaming_tool_events. The streaming-tool-events capability is only required when native tool calling is in use. For the realtime transport, see Realtime: WebSocket, SSE & Live Canvas.
Reasoning content pass-through
Section titled “Reasoning content pass-through”Several “thinking” models — DeepSeek-R1, GLM-4.7, Kimi K2.5 — return a reasoning_content field alongside the response text. Revka preserves it as an opaque blob in the chat response and re-sends it in subsequent multi-turn requests, because some provider APIs reject history that omits the field.
This is why, with those models, the full reasoning chain is preserved across turns in a conversation. It is handled transparently — you do not configure anything. Note that some providers take the opposite approach: Gemini’s thinking-model reasoning parts are filtered out of the final response rather than preserved.