Custom providers & local LLMs

Point Revka at any OpenAI/Anthropic-compatible endpoint and run local models with pacing tuning.

Revka ships with first-party support for more than 50 named providers, but you are not limited to them. Any service that speaks the OpenAI Chat Completions or Anthropic Messages wire format can be a Revka backend, and the local-inference servers most people run at home — Ollama, llama.cpp, LM Studio, vLLM, SGLang — are already wired in as named providers. This guide shows you how to point Revka at an arbitrary endpoint with the custom: and anthropic-custom: prefixes, how to set up the common local servers, and how to tune the agent loop so a slow local model doesn’t trip timeouts or loop detection.

Use this page when the provider you want isn’t in the catalog, when you run your own gateway or corporate LLM proxy, or when you want everything to stay on your machine. For picking a stock cloud provider, start with the Provider quickstart instead.

Custom providers & endpoints

A provider’s base URL normally comes from its built-in default. When you need a different one, Revka gives you two mechanisms: a URL-bearing provider prefix, or an explicit api_url override on a named provider.

The `custom:` and `anthropic-custom:` prefixes

The provider prefix carries the endpoint URL directly in default_provider, so you never register a named provider:

OpenAI-compatible
Anthropic-compatible

default_provider = "custom:https://your-api.example.com/v1"
api_key          = "your-key"
default_model    = "your-model-name"

custom: routes through Revka’s shared OpenAI-compatible implementation. It speaks /v1/chat/completions (with an optional fallback to the /v1/responses API on a 404), Bearer auth, vision via base64 images, and native tool calling using the OpenAI JSON-schema format.

default_provider = "anthropic-custom:https://your-api.example.com"
api_key          = "your-key"
default_model    = "claude-sonnet-4-6"

anthropic-custom: routes to a custom Anthropic Messages API endpoint instead — use it for proxies or gateways that expose the Anthropic wire format rather than OpenAI’s.

Custom providers authenticate with the generic key env vars rather than a provider-specific one:

export API_KEY="your-key"        # or: export REVKA_API_KEY="your-key"
revka agent -m "hello"

`api_url`, `api_path`, and `extra_headers`

These three top-level config keys let you bend a named provider toward a non-standard endpoint. They also apply to custom: providers when the prefix alone doesn’t carry everything the request needs.

Key	Type	Default	Meaning
`api_url`	`String?`	provider default	Base URL override (e.g. a self-hosted gateway or a remote Ollama at `http://10.0.0.1:11434`).
`api_path`	`String?`	provider default	Overrides the request path suffix for unusual APIs (e.g. `/v2/generate` instead of `/chat/completions`).
`extra_headers`	`Map<String,String>`	`{}`	Additional HTTP headers sent with every provider request.

default_provider = "custom:https://gateway.corp.example.com"
api_url          = "https://gateway.corp.example.com/llm/v1"
api_path         = "/chat/completions"
default_model    = "internal-model"

[extra_headers]
"X-Org-Id"    = "engineering"
"X-Route-Tag" = "revka"

extra_headers can also be set from the environment in Key:Value,Key2:Value2 format:

export REVKA_EXTRA_HEADERS="X-Org-Id:engineering,X-Route-Tag:revka"

Local & self-hosted providers

Revka treats the popular local-inference servers as first-class named providers, each with a sensible default endpoint and optional authentication — you do not need to export a dummy REVKA_API_KEY to use them. Override the endpoint with api_url whenever your server listens on a non-default host or port.

Provider ID	Aliases	Default endpoint	API key
`ollama`	—	`http://localhost:11434` (native `/api/chat`)	`OLLAMA_API_KEY` (optional)
`llamacpp`	`llama.cpp`	`http://localhost:8080/v1`	`LLAMACPP_API_KEY` (optional; vision enabled)
`lmstudio`	`lm-studio`	`http://localhost:1234/v1`	optional (default: `lm-studio`)
`vllm`	—	`http://localhost:8000/v1`	`VLLM_API_KEY` (optional)
`sglang`	—	`http://localhost:30000/v1`	`SGLANG_API_KEY` (optional)
`osaurus`	—	`http://localhost:1337/v1`	`OSAURUS_API_KEY` (optional; default: `osaurus`)

After configuring any local provider, confirm Revka can reach it and enumerate its models:

revka models refresh --provider llamacpp   # swap in your provider id
revka agent -m "hello"

revka doctor models --provider <id> also probes a single provider and reports ok, auth/access, or a connectivity error — useful before you wire the provider into a daemon.

Ollama

Ollama uses its native /api/chat endpoint, not the OpenAI-compatible path, so vision images are passed through Ollama’s own image markers ([IMAGE:<source>]).

default_provider = "ollama"
default_model    = "llama3.2"

A remote or cloud Ollama needs an api_url; a trailing /api in the URL is normalized automatically. The :cloud model suffix is only valid against a remote endpoint:

default_provider = "ollama"
api_url          = "https://ollama.myserver.com"
default_model    = "llama3.2:cloud"

To toggle a model’s thinking output, set reasoning_enabled in [runtime] — Revka forwards it as Ollama’s think: true/false. It only takes effect on models that support reasoning.

[runtime]
reasoning_enabled = true

llama.cpp

The llamacpp provider targets llama-server. Start the server, then point Revka’s api_url at it. An API key is needed only if you launched llama-server with --api-key.

llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 133000 --host 127.0.0.1 --port 8033

default_provider  = "llamacpp"
api_url           = "http://127.0.0.1:8033/v1"
default_model     = "ggml-org/gpt-oss-20b-GGUF"
default_temperature = 0.7

LM Studio

LM Studio exposes an OpenAI-compatible server on port 1234. No key is required — the provider supplies the placeholder lm-studio automatically.

default_provider = "lmstudio"
default_model    = "your-loaded-model"

vLLM and SGLang

Both are OpenAI-compatible inference servers. Start the server and set the matching provider; auth is optional unless the server enforces it.

vLLM
SGLang

vllm serve meta-llama/Llama-3.1-8B-Instruct

default_provider = "vllm"
default_model    = "meta-llama/Llama-3.1-8B-Instruct"
default_temperature = 0.7

python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port 30000

default_provider = "sglang"
default_model    = "meta-llama/Llama-3.1-8B-Instruct"
default_temperature = 0.7

Tuning for slow & local LLMs

A 13B model on a laptop is far slower than a hosted frontier model, and a slow first token can look like a hang or a runaway loop to the default agent settings. Two config sections exist for exactly this case. Cloud users rarely need either.

`[pacing]` — timeouts and loop detection

The [pacing] section extends per-step timeout and loop-detection behavior for slow workloads.

[pacing]
step_timeout_secs               = 120
loop_detection_min_elapsed_secs = 60
loop_ignore_tools               = ["browser_screenshot", "browser_navigate"]
message_timeout_scale_max       = 8

Key	Default	Meaning
`step_timeout_secs`	unset	Per-step LLM inference timeout, independent of the overall message budget — firing it does not consume the total budget.
`loop_detection_min_elapsed_secs`	unset	Grace period before loop detection starts counting, so a slow start isn’t mistaken for a loop.
`loop_ignore_tools`	`[]`	Tools excluded from identical-output loop detection (e.g. tools that legitimately repeat).
`message_timeout_scale_max`	`4`	Cap on how far the base message timeout scales up with tool-loop depth.
`loop_detection_enabled`	`true`	Master toggle for pattern-based loop detection.
`loop_detection_window_size`	`20`	Sliding-window size for the loop detector.
`loop_detection_max_repeats`	`3`	Consecutive identical tool+args calls before a loop warning fires.

`compact_context` for small models

The compact_context key in [agent] switches the agent to a compact bootstrap prompt suited to 13B-or-smaller models, trimming the system prompt that a small context window can’t afford. It is on by default.

[agent]
compact_context = true

Pair it with a few related [agent] knobs when you’re tight on context:

Key	Default	Meaning
`compact_context`	`true`	Use the compact bootstrap prompt for small (≤13B) models.
`max_system_prompt_chars`	`0` (unlimited)	Hard-truncate the system prompt to N characters — useful for tiny context windows.
`max_context_tokens`	`1050000`	Token budget before context compression triggers; lower it to match a small model’s window.
`max_tool_result_chars`	`50000`	Maximum characters per tool result before middle-truncation.

Per-route and per-agent providers

You don’t have to commit the whole instance to one backend. A custom or local provider can be scoped:

Routing hints — map hint:fast (or any name) to a specific provider+model with [[model_routes]], including custom:/anthropic-custom: endpoints. See Routing, reliability & tuning.
Delegate sub-agents — give a [agents.<name>] block its own provider and model, e.g. a local coding model for a coder sub-agent:
```
[agents.coder]
provider    = "ollama"
model       = "qwen2.5-coder:32b"
temperature = 0.2
```
Reliability fallbacks — list a local provider as a fallback so the agent keeps working when a cloud provider is down. See [reliability].

Verify your setup

Probe the endpoint. revka doctor models --provider <id> should report ok with a model count. An auth/access status means the key (if any) is wrong or the plan lacks access; a connectivity error means the URL or server is unreachable.
Refresh the catalog. revka models refresh --provider <id> pulls the live model list so default_model validates. If the gateway doesn’t implement a models endpoint, set default_model by hand and send a test message.
Send a message. revka agent -m "hello" exercises the full path. For local models, watch the first-token latency — if it exceeds your settings, raise [pacing] step_timeout_secs.
Run the full diagnostic. revka doctor validates the resolved provider, key presence, and model in one pass before you start the daemon.

Next steps

Local, self-hosted & custom endpoints Full provider reference for local servers and custom endpoints.

Provider catalog Every provider ID, alias, base URL, and credential env var.

Routing, reliability & tuning Model routes, fallback chains, timeouts, and key-mismatch detection.

Config: provider, agent & routing Reference for [agent], [pacing], and the core provider keys.

Custom providers & local LLMs

Custom providers & endpoints

The `custom:` and `anthropic-custom:` prefixes

`api_url`, `api_path`, and `extra_headers`

Local & self-hosted providers

Ollama

llama.cpp

LM Studio

vLLM and SGLang

Tuning for slow & local LLMs

`[pacing]` — timeouts and loop detection

`compact_context` for small models

Per-route and per-agent providers

Verify your setup

Next steps

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Custom providers & local LLMs

Custom providers & endpoints

The custom: and anthropic-custom: prefixes

api_url, api_path, and extra_headers

Local & self-hosted providers

Ollama

llama.cpp

LM Studio

vLLM and SGLang

Tuning for slow & local LLMs

[pacing] — timeouts and loop detection

compact_context for small models

Per-route and per-agent providers

Verify your setup

Next steps

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

The `custom:` and `anthropic-custom:` prefixes

`api_url`, `api_path`, and `extra_headers`

`[pacing]` — timeouts and loop detection

`compact_context` for small models