Skip to content

Custom providers & local LLMs

Point Revka at any OpenAI/Anthropic-compatible endpoint and run local models with pacing tuning.

Revka ships with first-party support for more than 50 named providers, but you are not limited to them. Any service that speaks the OpenAI Chat Completions or Anthropic Messages wire format can be a Revka backend, and the local-inference servers most people run at home — Ollama, llama.cpp, LM Studio, vLLM, SGLang — are already wired in as named providers. This guide shows you how to point Revka at an arbitrary endpoint with the custom: and anthropic-custom: prefixes, how to set up the common local servers, and how to tune the agent loop so a slow local model doesn’t trip timeouts or loop detection.

Use this page when the provider you want isn’t in the catalog, when you run your own gateway or corporate LLM proxy, or when you want everything to stay on your machine. For picking a stock cloud provider, start with the Provider quickstart instead.

A provider’s base URL normally comes from its built-in default. When you need a different one, Revka gives you two mechanisms: a URL-bearing provider prefix, or an explicit api_url override on a named provider.

The custom: and anthropic-custom: prefixes

Section titled “The custom: and anthropic-custom: prefixes”

The provider prefix carries the endpoint URL directly in default_provider, so you never register a named provider:

~/.revka/config.toml
default_provider = "custom:https://your-api.example.com/v1"
api_key = "your-key"
default_model = "your-model-name"

custom: routes through Revka’s shared OpenAI-compatible implementation. It speaks /v1/chat/completions (with an optional fallback to the /v1/responses API on a 404), Bearer auth, vision via base64 images, and native tool calling using the OpenAI JSON-schema format.

Custom providers authenticate with the generic key env vars rather than a provider-specific one:

Terminal window
export API_KEY="your-key" # or: export REVKA_API_KEY="your-key"
revka agent -m "hello"

These three top-level config keys let you bend a named provider toward a non-standard endpoint. They also apply to custom: providers when the prefix alone doesn’t carry everything the request needs.

KeyTypeDefaultMeaning
api_urlString?provider defaultBase URL override (e.g. a self-hosted gateway or a remote Ollama at http://10.0.0.1:11434).
api_pathString?provider defaultOverrides the request path suffix for unusual APIs (e.g. /v2/generate instead of /chat/completions).
extra_headersMap<String,String>{}Additional HTTP headers sent with every provider request.
default_provider = "custom:https://gateway.corp.example.com"
api_url = "https://gateway.corp.example.com/llm/v1"
api_path = "/chat/completions"
default_model = "internal-model"
[extra_headers]
"X-Org-Id" = "engineering"
"X-Route-Tag" = "revka"

extra_headers can also be set from the environment in Key:Value,Key2:Value2 format:

Terminal window
export REVKA_EXTRA_HEADERS="X-Org-Id:engineering,X-Route-Tag:revka"

Revka treats the popular local-inference servers as first-class named providers, each with a sensible default endpoint and optional authentication — you do not need to export a dummy REVKA_API_KEY to use them. Override the endpoint with api_url whenever your server listens on a non-default host or port.

Provider IDAliasesDefault endpointAPI key
ollamahttp://localhost:11434 (native /api/chat)OLLAMA_API_KEY (optional)
llamacppllama.cpphttp://localhost:8080/v1LLAMACPP_API_KEY (optional; vision enabled)
lmstudiolm-studiohttp://localhost:1234/v1optional (default: lm-studio)
vllmhttp://localhost:8000/v1VLLM_API_KEY (optional)
sglanghttp://localhost:30000/v1SGLANG_API_KEY (optional)
osaurushttp://localhost:1337/v1OSAURUS_API_KEY (optional; default: osaurus)

After configuring any local provider, confirm Revka can reach it and enumerate its models:

Terminal window
revka models refresh --provider llamacpp # swap in your provider id
revka agent -m "hello"

revka doctor models --provider <id> also probes a single provider and reports ok, auth/access, or a connectivity error — useful before you wire the provider into a daemon.

Ollama uses its native /api/chat endpoint, not the OpenAI-compatible path, so vision images are passed through Ollama’s own image markers ([IMAGE:<source>]).

default_provider = "ollama"
default_model = "llama3.2"

A remote or cloud Ollama needs an api_url; a trailing /api in the URL is normalized automatically. The :cloud model suffix is only valid against a remote endpoint:

default_provider = "ollama"
api_url = "https://ollama.myserver.com"
default_model = "llama3.2:cloud"

To toggle a model’s thinking output, set reasoning_enabled in [runtime] — Revka forwards it as Ollama’s think: true/false. It only takes effect on models that support reasoning.

[runtime]
reasoning_enabled = true

The llamacpp provider targets llama-server. Start the server, then point Revka’s api_url at it. An API key is needed only if you launched llama-server with --api-key.

Terminal window
llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 133000 --host 127.0.0.1 --port 8033
default_provider = "llamacpp"
api_url = "http://127.0.0.1:8033/v1"
default_model = "ggml-org/gpt-oss-20b-GGUF"
default_temperature = 0.7

LM Studio exposes an OpenAI-compatible server on port 1234. No key is required — the provider supplies the placeholder lm-studio automatically.

default_provider = "lmstudio"
default_model = "your-loaded-model"

Both are OpenAI-compatible inference servers. Start the server and set the matching provider; auth is optional unless the server enforces it.

Terminal window
vllm serve meta-llama/Llama-3.1-8B-Instruct
default_provider = "vllm"
default_model = "meta-llama/Llama-3.1-8B-Instruct"
default_temperature = 0.7

A 13B model on a laptop is far slower than a hosted frontier model, and a slow first token can look like a hang or a runaway loop to the default agent settings. Two config sections exist for exactly this case. Cloud users rarely need either.

The [pacing] section extends per-step timeout and loop-detection behavior for slow workloads.

[pacing]
step_timeout_secs = 120
loop_detection_min_elapsed_secs = 60
loop_ignore_tools = ["browser_screenshot", "browser_navigate"]
message_timeout_scale_max = 8
KeyDefaultMeaning
step_timeout_secsunsetPer-step LLM inference timeout, independent of the overall message budget — firing it does not consume the total budget.
loop_detection_min_elapsed_secsunsetGrace period before loop detection starts counting, so a slow start isn’t mistaken for a loop.
loop_ignore_tools[]Tools excluded from identical-output loop detection (e.g. tools that legitimately repeat).
message_timeout_scale_max4Cap on how far the base message timeout scales up with tool-loop depth.
loop_detection_enabledtrueMaster toggle for pattern-based loop detection.
loop_detection_window_size20Sliding-window size for the loop detector.
loop_detection_max_repeats3Consecutive identical tool+args calls before a loop warning fires.

The compact_context key in [agent] switches the agent to a compact bootstrap prompt suited to 13B-or-smaller models, trimming the system prompt that a small context window can’t afford. It is on by default.

[agent]
compact_context = true

Pair it with a few related [agent] knobs when you’re tight on context:

KeyDefaultMeaning
compact_contexttrueUse the compact bootstrap prompt for small (≤13B) models.
max_system_prompt_chars0 (unlimited)Hard-truncate the system prompt to N characters — useful for tiny context windows.
max_context_tokens1050000Token budget before context compression triggers; lower it to match a small model’s window.
max_tool_result_chars50000Maximum characters per tool result before middle-truncation.

You don’t have to commit the whole instance to one backend. A custom or local provider can be scoped:

  • Routing hints — map hint:fast (or any name) to a specific provider+model with [[model_routes]], including custom:/anthropic-custom: endpoints. See Routing, reliability & tuning.

  • Delegate sub-agents — give a [agents.<name>] block its own provider and model, e.g. a local coding model for a coder sub-agent:

    [agents.coder]
    provider = "ollama"
    model = "qwen2.5-coder:32b"
    temperature = 0.2
  • Reliability fallbacks — list a local provider as a fallback so the agent keeps working when a cloud provider is down. See [reliability].

  1. Probe the endpoint. revka doctor models --provider <id> should report ok with a model count. An auth/access status means the key (if any) is wrong or the plan lacks access; a connectivity error means the URL or server is unreachable.

  2. Refresh the catalog. revka models refresh --provider <id> pulls the live model list so default_model validates. If the gateway doesn’t implement a models endpoint, set default_model by hand and send a test message.

  3. Send a message. revka agent -m "hello" exercises the full path. For local models, watch the first-token latency — if it exceeds your settings, raise [pacing] step_timeout_secs.

  4. Run the full diagnostic. revka doctor validates the resolved provider, key presence, and model in one pass before you start the daemon.