Custom providers & local LLMs
Point Revka at any OpenAI/Anthropic-compatible endpoint and run local models with pacing tuning.
Revka ships with first-party support for more than 50 named providers, but you are not limited to them. Any service that speaks the OpenAI Chat Completions or Anthropic Messages wire format can be a Revka backend, and the local-inference servers most people run at home — Ollama, llama.cpp, LM Studio, vLLM, SGLang — are already wired in as named providers. This guide shows you how to point Revka at an arbitrary endpoint with the custom: and anthropic-custom: prefixes, how to set up the common local servers, and how to tune the agent loop so a slow local model doesn’t trip timeouts or loop detection.
Use this page when the provider you want isn’t in the catalog, when you run your own gateway or corporate LLM proxy, or when you want everything to stay on your machine. For picking a stock cloud provider, start with the Provider quickstart instead.
Custom providers & endpoints
Section titled “Custom providers & endpoints”A provider’s base URL normally comes from its built-in default. When you need a different one, Revka gives you two mechanisms: a URL-bearing provider prefix, or an explicit api_url override on a named provider.
The custom: and anthropic-custom: prefixes
Section titled “The custom: and anthropic-custom: prefixes”The provider prefix carries the endpoint URL directly in default_provider, so you never register a named provider:
default_provider = "custom:https://your-api.example.com/v1"api_key = "your-key"default_model = "your-model-name"custom: routes through Revka’s shared OpenAI-compatible implementation. It speaks /v1/chat/completions (with an optional fallback to the /v1/responses API on a 404), Bearer auth, vision via base64 images, and native tool calling using the OpenAI JSON-schema format.
default_provider = "anthropic-custom:https://your-api.example.com"api_key = "your-key"default_model = "claude-sonnet-4-6"anthropic-custom: routes to a custom Anthropic Messages API endpoint instead — use it for proxies or gateways that expose the Anthropic wire format rather than OpenAI’s.
Custom providers authenticate with the generic key env vars rather than a provider-specific one:
export API_KEY="your-key" # or: export REVKA_API_KEY="your-key"revka agent -m "hello"api_url, api_path, and extra_headers
Section titled “api_url, api_path, and extra_headers”These three top-level config keys let you bend a named provider toward a non-standard endpoint. They also apply to custom: providers when the prefix alone doesn’t carry everything the request needs.
| Key | Type | Default | Meaning |
|---|---|---|---|
api_url | String? | provider default | Base URL override (e.g. a self-hosted gateway or a remote Ollama at http://10.0.0.1:11434). |
api_path | String? | provider default | Overrides the request path suffix for unusual APIs (e.g. /v2/generate instead of /chat/completions). |
extra_headers | Map<String,String> | {} | Additional HTTP headers sent with every provider request. |
default_provider = "custom:https://gateway.corp.example.com"api_url = "https://gateway.corp.example.com/llm/v1"api_path = "/chat/completions"default_model = "internal-model"
[extra_headers]"X-Org-Id" = "engineering""X-Route-Tag" = "revka"extra_headers can also be set from the environment in Key:Value,Key2:Value2 format:
export REVKA_EXTRA_HEADERS="X-Org-Id:engineering,X-Route-Tag:revka"Local & self-hosted providers
Section titled “Local & self-hosted providers”Revka treats the popular local-inference servers as first-class named providers, each with a sensible default endpoint and optional authentication — you do not need to export a dummy REVKA_API_KEY to use them. Override the endpoint with api_url whenever your server listens on a non-default host or port.
| Provider ID | Aliases | Default endpoint | API key |
|---|---|---|---|
ollama | — | http://localhost:11434 (native /api/chat) | OLLAMA_API_KEY (optional) |
llamacpp | llama.cpp | http://localhost:8080/v1 | LLAMACPP_API_KEY (optional; vision enabled) |
lmstudio | lm-studio | http://localhost:1234/v1 | optional (default: lm-studio) |
vllm | — | http://localhost:8000/v1 | VLLM_API_KEY (optional) |
sglang | — | http://localhost:30000/v1 | SGLANG_API_KEY (optional) |
osaurus | — | http://localhost:1337/v1 | OSAURUS_API_KEY (optional; default: osaurus) |
After configuring any local provider, confirm Revka can reach it and enumerate its models:
revka models refresh --provider llamacpp # swap in your provider idrevka agent -m "hello"revka doctor models --provider <id> also probes a single provider and reports ok, auth/access, or a connectivity error — useful before you wire the provider into a daemon.
Ollama
Section titled “Ollama”Ollama uses its native /api/chat endpoint, not the OpenAI-compatible path, so vision images are passed through Ollama’s own image markers ([IMAGE:<source>]).
default_provider = "ollama"default_model = "llama3.2"A remote or cloud Ollama needs an api_url; a trailing /api in the URL is normalized automatically. The :cloud model suffix is only valid against a remote endpoint:
default_provider = "ollama"api_url = "https://ollama.myserver.com"default_model = "llama3.2:cloud"To toggle a model’s thinking output, set reasoning_enabled in [runtime] — Revka forwards it as Ollama’s think: true/false. It only takes effect on models that support reasoning.
[runtime]reasoning_enabled = truellama.cpp
Section titled “llama.cpp”The llamacpp provider targets llama-server. Start the server, then point Revka’s api_url at it. An API key is needed only if you launched llama-server with --api-key.
llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 133000 --host 127.0.0.1 --port 8033default_provider = "llamacpp"api_url = "http://127.0.0.1:8033/v1"default_model = "ggml-org/gpt-oss-20b-GGUF"default_temperature = 0.7LM Studio
Section titled “LM Studio”LM Studio exposes an OpenAI-compatible server on port 1234. No key is required — the provider supplies the placeholder lm-studio automatically.
default_provider = "lmstudio"default_model = "your-loaded-model"vLLM and SGLang
Section titled “vLLM and SGLang”Both are OpenAI-compatible inference servers. Start the server and set the matching provider; auth is optional unless the server enforces it.
vllm serve meta-llama/Llama-3.1-8B-Instructdefault_provider = "vllm"default_model = "meta-llama/Llama-3.1-8B-Instruct"default_temperature = 0.7python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port 30000default_provider = "sglang"default_model = "meta-llama/Llama-3.1-8B-Instruct"default_temperature = 0.7Tuning for slow & local LLMs
Section titled “Tuning for slow & local LLMs”A 13B model on a laptop is far slower than a hosted frontier model, and a slow first token can look like a hang or a runaway loop to the default agent settings. Two config sections exist for exactly this case. Cloud users rarely need either.
[pacing] — timeouts and loop detection
Section titled “[pacing] — timeouts and loop detection”The [pacing] section extends per-step timeout and loop-detection behavior for slow workloads.
[pacing]step_timeout_secs = 120loop_detection_min_elapsed_secs = 60loop_ignore_tools = ["browser_screenshot", "browser_navigate"]message_timeout_scale_max = 8| Key | Default | Meaning |
|---|---|---|
step_timeout_secs | unset | Per-step LLM inference timeout, independent of the overall message budget — firing it does not consume the total budget. |
loop_detection_min_elapsed_secs | unset | Grace period before loop detection starts counting, so a slow start isn’t mistaken for a loop. |
loop_ignore_tools | [] | Tools excluded from identical-output loop detection (e.g. tools that legitimately repeat). |
message_timeout_scale_max | 4 | Cap on how far the base message timeout scales up with tool-loop depth. |
loop_detection_enabled | true | Master toggle for pattern-based loop detection. |
loop_detection_window_size | 20 | Sliding-window size for the loop detector. |
loop_detection_max_repeats | 3 | Consecutive identical tool+args calls before a loop warning fires. |
compact_context for small models
Section titled “compact_context for small models”The compact_context key in [agent] switches the agent to a compact bootstrap prompt suited to 13B-or-smaller models, trimming the system prompt that a small context window can’t afford. It is on by default.
[agent]compact_context = truePair it with a few related [agent] knobs when you’re tight on context:
| Key | Default | Meaning |
|---|---|---|
compact_context | true | Use the compact bootstrap prompt for small (≤13B) models. |
max_system_prompt_chars | 0 (unlimited) | Hard-truncate the system prompt to N characters — useful for tiny context windows. |
max_context_tokens | 1050000 | Token budget before context compression triggers; lower it to match a small model’s window. |
max_tool_result_chars | 50000 | Maximum characters per tool result before middle-truncation. |
Per-route and per-agent providers
Section titled “Per-route and per-agent providers”You don’t have to commit the whole instance to one backend. A custom or local provider can be scoped:
-
Routing hints — map
hint:fast(or any name) to a specific provider+model with[[model_routes]], includingcustom:/anthropic-custom:endpoints. See Routing, reliability & tuning. -
Delegate sub-agents — give a
[agents.<name>]block its ownproviderandmodel, e.g. a local coding model for acodersub-agent:[agents.coder]provider = "ollama"model = "qwen2.5-coder:32b"temperature = 0.2 -
Reliability fallbacks — list a local provider as a fallback so the agent keeps working when a cloud provider is down. See
[reliability].
Verify your setup
Section titled “Verify your setup”-
Probe the endpoint.
revka doctor models --provider <id>should reportokwith a model count. Anauth/accessstatus means the key (if any) is wrong or the plan lacks access; a connectivity error means the URL or server is unreachable. -
Refresh the catalog.
revka models refresh --provider <id>pulls the live model list sodefault_modelvalidates. If the gateway doesn’t implement a models endpoint, setdefault_modelby hand and send a test message. -
Send a message.
revka agent -m "hello"exercises the full path. For local models, watch the first-token latency — if it exceeds your settings, raise[pacing] step_timeout_secs. -
Run the full diagnostic.
revka doctorvalidates the resolved provider, key presence, and model in one pass before you start the daemon.