Routing, reliability & tuning

hint:-based routing, query classification, cost-optimized and embedding routes, fallback chains, retries, warmup, and per-request tuning.

Revka layers two optional wrappers on top of any base provider: a router that dispatches a single logical request to different (provider, model) pairs, and a reliability wrapper that adds retries, fallback chains, and key rotation. On top of that, a handful of per-request knobs let you tune timeouts, token caps, reasoning effort, and headers without touching the provider’s own settings.

Use this page when you want to: stabilize call sites against model deprecations, route different kinds of traffic to different models, fall over to a backup provider when one is down, or squeeze latency and cost out of a deployment. If you are still picking and configuring a single provider, start with the Provider quickstart and the Provider catalog.

Model routing with `hint:` prefixes

The router lets you give a request a symbolic model name instead of a concrete one. A model parameter of hint:reasoning resolves to whichever (provider, model) you mapped to "reasoning" in config. The benefit: your call sites — channels, tools, agent steps — stay stable, and you upgrade models by editing one config entry.

Define routes as [[model_routes]] tables:

[[model_routes]]
hint     = "reasoning"
provider = "openrouter"
model    = "anthropic/claude-opus-4-5"
api_key  = ""   # optional per-route key override

[[model_routes]]
hint     = "fast"
provider = "groq"
model    = "llama-3.3-70b-versatile"

Then anywhere a model is accepted, pass the hint:

hint:reasoning

Field	Required	Meaning
`hint`	yes	Unique symbolic name (the part after `hint:`).
`provider`	yes	A known provider ID.
`model`	yes	The model ID to use with that provider.
`api_key`	no	Per-route key override.

Upgrade models safely

This is the main reason to use hints. Keep hint:reasoning (and friends) hard-coded at every call site, and when a provider deprecates a model ID, change only the model = value in the route.

Keep call sites stable: hint:reasoning, hint:fast, hint:semantic.
Change only the target under [[model_routes]] (or [[embedding_routes]]).
Validate the new config:
Terminal window
```
revka doctor
revka status
```
Smoke-test one representative flow (a chat plus a memory recall) before rolling out.

revka doctor validates that your model routes (and embedding routes) point at known providers, so a typo surfaces before it hits production.

Cost-optimized routing

Two special hints don’t look up a fixed route. Instead they score every route in the table by price and pick the cheapest qualifying one:

hint:cost-optimized
hint:cheapest

Scoring uses the [cost.prices] data (input + output cost per 1M tokens) keyed by model name, which lives under the [cost] section’s prices map. Candidates can be filtered by capability requirements (vision, native tool calling). If no pricing data is available, the router falls back to the default route.

[[model_routes]]
hint     = "fast"
provider = "groq"
model    = "llama-3.3-70b-versatile"

[[model_routes]]
hint     = "reasoning"
provider = "openrouter"
model    = "anthropic/claude-opus-4-5"

[cost.prices]
# add per-1M-token pricing so cost-optimized routing has data to score
# e.g. "groq/llama-3.3-70b-versatile" = { input = 0.59, output = 0.79 }

Query classification (automatic routing)

Query classification picks a hint: automatically based on the content of the incoming message — no LLM call is made to classify, it is pure string matching. Configure it under [query_classification]:

[query_classification]
enabled = true

[[query_classification.rules]]
hint     = "reasoning"
keywords = ["explain", "analyze", "why"]
min_length = 200
priority = 10

[[query_classification.rules]]
hint     = "fast"
keywords = ["hi", "hello", "thanks"]
max_length = 50
priority  = 5

Key	Default	Meaning
`enabled`	`false`	Master switch for classification.
`rules`	`[]`	Rules, evaluated in priority order.

Each rule:

Key	Default	Meaning
`hint`	required	Must match a configured `[[model_routes]]` hint.
`keywords`	`[]`	Case-insensitive substring matches.
`patterns`	`[]`	Case-sensitive literal matches (e.g. code fences, `"fn "`).
`min_length`	unset	Only match when the message is at least N characters.
`max_length`	unset	Only match when the message is at most N characters.
`priority`	`0`	Higher priority is checked first.

This is a zero-cost way to send long, analytical questions to a strong reasoning model and short greetings to a cheap, fast one. Because matching is literal, patterns is ideal for routing code-heavy messages (match on a code fence or fn ) to a coding model.

Embedding routing

[[embedding_routes]] entries can be defined in config and are validated by revka doctor, but embeddings have been removed from Revka — the routes are currently inert and are never used to perform any embedding at runtime. Only schema validation remains.

[[embedding_routes]]
hint       = "semantic"
provider   = "openai"
model      = "text-embedding-3-small"
dimensions = 1536

[[embedding_routes]]
hint       = "archive"
provider   = "custom:https://embed.example.com/v1"
model      = "your-embedding-model-id"
dimensions = 1024
api_key    = "route-specific-key"

Field	Meaning
`hint`	Symbolic name.
`provider`	One of `none`, `openai`, or `custom:<url>` (an OpenAI-compatible embeddings endpoint).
`model`	Embedding model ID.
`dimensions`	Optional. Would override when the API default differs from your storage schema.
`api_key`	Optional per-route key override.

For the broader memory setup, see Kumiho graph memory and the Memory overview.

The `model_routing_config` tool

The agent can update routing config for you in natural language. Ask in chat — for example “Set conversation provider to kimi, model moonshot-v1-8k” — and the assistant calls the model_routing_config tool to persist the change to config.toml, no TOML editing required.

The tool can add or change hints, models, and providers in your route table. It cannot change security policies or delete data. This pairs well with the upgrade-safely pattern: keep call sites on stable hints and let the assistant retarget them.

Reliability: the `ReliableProvider` retry wrapper

The ReliableProvider wraps your primary provider — and an optional fallback chain — with a three-tier resilience strategy, applied in order:

Model fallback chain — try each model in the configured chain for the request.
Provider fallback chain — try each provider in fallback_providers after the primary exhausts its retries.
Inner retry loop — exponential backoff per (provider, model) pair.

Errors are classified so retries only happen when they help:

Class	Examples	Behavior
Retryable	5xx, timeouts	Retried with backoff.
Non-retryable	4xx auth/validation (e.g. 400, 401)	Fail fast — no retry.
Rate-limited	429 with `Retry-After`	Retried; `Retry-After` is honored (Gemini 429 bodies are parsed).
Business rate-limited	quota exhausted	No retry.

Context-window errors trigger automatic history truncation (the oldest half of non-system messages is dropped) before retrying.

Configure it under [reliability]:

[reliability]
provider_retries    = 2        # attempts per provider before fallover (default: 2)
provider_backoff_ms = 500      # base backoff ms; doubles each retry, capped at 10000
fallback_providers  = ["anthropic", "openai"]
api_keys            = ["sk-second-key", "sk-third-key"]  # round-robin on 429

[reliability.model_fallbacks]
"claude-opus-4-20250514" = ["claude-sonnet-4-20250514", "gpt-4o"]

Key	Default	Meaning
`provider_retries`	`2`	Attempts per provider before falling over.
`provider_backoff_ms`	`500`	Base backoff in ms; doubles each retry, capped at 10000 ms.
`fallback_providers`	unset	Ordered provider names tried after the primary.
`api_keys`	unset	Extra keys listed for round-robin rotation on 429s (rotation is currently a no-op at request time — the rotated key cannot be applied because the Provider trait has no `set_api_key`; retries continue with the original key).
`model_fallbacks`	unset	Map of `model → [fallback_model, …]`.

`fallback_providers` and `model_fallbacks`

These two settings cover the most common reliability scenarios:

Keep a primary, fall over to a different vendor when it is unreachable:

default_provider = "anthropic"
default_model    = "claude-sonnet-4-6-20250514"

[reliability]
fallback_providers = ["openai", "openrouter"]

Map a soon-to-be-retired model to live replacements so requests keep working during the transition:

[reliability.model_fallbacks]
"claude-opus-4-20250514" = ["claude-sonnet-4-6-20250514", "gpt-4o"]

api_keys lists additional keys for attempted round-robin rotation on 429s. Note: key rotation is currently a no-op at request time — the Provider trait has no set_api_key, so retries continue with the original key regardless of rotation:

[reliability]
api_keys = ["sk-key-a", "sk-key-b", "sk-key-c"]

Validate fallback providers before they matter — revka doctor checks that every name in fallback_providers is a known provider.

Per-request tuning with `ProviderRuntimeOptions`

A set of runtime knobs is passed to every provider when it is built. Each maps to a config key (or environment variable) and applies to the provider’s outbound requests:

Field	Config key	Default	Meaning
API URL override	`api_url`	provider default	Custom base URL (proxies, self-hosted endpoints).
Reasoning on/off	`[runtime] reasoning_enabled`	unset	Toggle extended thinking; env `REVKA_REASONING_ENABLED`.
Reasoning effort	`[runtime] reasoning_effort`	unset	`minimal`, `low`, `medium`, `high`, or `xhigh`.
HTTP timeout	`provider_timeout_secs`	`120`	Per-call timeout in seconds.
Extra headers	`extra_headers` (map)	`{}`	Additional HTTP headers; env `REVKA_EXTRA_HEADERS`.
API path	`api_path`	provider default	Override the request path suffix for unusual APIs.
Max output tokens	`provider_max_tokens`	provider default	Cap output tokens.

provider_timeout_secs = 180
provider_max_tokens   = 8192
api_path              = "/v2/generate"

[runtime]
reasoning_enabled = true
reasoning_effort  = "high"

[extra_headers]
"X-Org" = "research"

The REVKA_EXTRA_HEADERS environment variable uses a Key:Value,Key2:Value2 format:

export REVKA_EXTRA_HEADERS="X-Org:research,X-Trace:on"

Provider warmup

Every provider exposes a warmup() method that pre-establishes the HTTP connection — DNS, the TLS handshake, and HTTP/2 setup — before the first real request. The ReliableProvider and the router call warmup() on all of their registered sub-providers sequentially at startup.

Warmup is purely a latency optimization: failures are non-fatal and logged at WARN. It mainly helps latency-sensitive deployments where the very first request would otherwise pay the full connection-setup cost.

Streaming support

Providers that support streaming emit StreamEvent values over an async channel: TextDelta, ToolCall, Usage, and Final, plus PreExecutedToolCall and PreExecutedToolResult for OpenAI-compatible custom proxies (e.g. claude-max-api-proxy / Claude Code proxy) that pre-execute tools and forward observability events via custom SSE fields. The gateway surfaces these as Server-Sent Events for the dashboard and API.

When wrappers are in play:

The ReliableProvider forwards the stream from the first qualifying provider.
The router delegates streaming to whichever provider its hint resolved to.

Streaming tool events are a separate capability (supports_streaming_tool_events). For a streamed tool-call request, a provider is used in streaming mode if it supports streaming and either: (a) it does not use native tool calling (tool calls are then parsed from text afterward), or (b) it also declares supports_streaming_tool_events. The streaming-tool-events capability is only required when native tool calling is in use. For the realtime transport, see Realtime: WebSocket, SSE & Live Canvas.

Reasoning content pass-through

Several “thinking” models — DeepSeek-R1, GLM-4.7, Kimi K2.5 — return a reasoning_content field alongside the response text. Revka preserves it as an opaque blob in the chat response and re-sends it in subsequent multi-turn requests, because some provider APIs reject history that omits the field.

This is why, with those models, the full reasoning chain is preserved across turns in a conversation. It is handled transparently — you do not configure anything. Note that some providers take the opposite approach: Gemini’s thinking-model reasoning parts are filtered out of the final response rather than preserved.

Next steps

Provider catalog Every supported provider: IDs, aliases, base URLs, env vars, and capabilities.

Provider quickstart Pick a provider, set a key, and run your first chat.

Config: provider, agent & routing The config reference for routes, reliability, and runtime tuning.

Diagnostics revka doctor validates routes, fallback providers, and keys.

Cost tracking & budgets Pair cost-optimized routing with daily and monthly spend limits.

Local & custom endpoints api_url, api_path, extra_headers, and custom: prefixes.

Routing, reliability & tuning

Model routing with `hint:` prefixes

Upgrade models safely

Cost-optimized routing

Query classification (automatic routing)

Embedding routing

The `model_routing_config` tool

Reliability: the `ReliableProvider` retry wrapper

`fallback_providers` and `model_fallbacks`

Per-request tuning with `ProviderRuntimeOptions`

Provider warmup

Streaming support

Reasoning content pass-through

Next steps

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Routing, reliability & tuning

Model routing with hint: prefixes

Upgrade models safely

Cost-optimized routing

Query classification (automatic routing)

Embedding routing

The model_routing_config tool

Reliability: the ReliableProvider retry wrapper

fallback_providers and model_fallbacks

Per-request tuning with ProviderRuntimeOptions

Provider warmup

Streaming support

Reasoning content pass-through

Next steps

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Model routing with `hint:` prefixes

The `model_routing_config` tool

Reliability: the `ReliableProvider` retry wrapper

`fallback_providers` and `model_fallbacks`

Per-request tuning with `ProviderRuntimeOptions`