Local, self-hosted & custom endpoints
Ollama, llama.cpp, LM Studio, vLLM, SGLang, Osaurus, and the custom:/anthropic-custom: endpoint prefixes.
This page covers running Revka against a model server you host yourself — Ollama, llama.cpp, LM Studio, vLLM, SGLang, and Osaurus — plus the two “bring your own endpoint” prefixes (custom: and anthropic-custom:) for any OpenAI- or Anthropic-compatible API. Use these when you want local inference, an air-gapped deployment, a self-hosted GPU box on your network, or a gateway/proxy that the named providers don’t cover.
For the full provider table and credential resolution order, see the Provider catalog. For slow-LLM tuning (timeouts, loop detection), see Routing, reliability & tuning and the [pacing] section in Config: provider, agent & routing.
Ollama
Section titled “Ollama”Ollama is a first-party provider. It talks to Ollama’s native /api/chat endpoint (not /v1/chat/completions), and supports vision, streaming, and an optional thinking toggle.
default_provider = "ollama"default_model = "llama3.2"The default base URL is http://localhost:11434. Override it for a remote instance with api_url (or the REVKA_PROVIDER_URL environment variable):
default_provider = "ollama"api_url = "http://10.0.0.1:11434"default_model = "llama3.2"A trailing /api or /api/chat in api_url is automatically normalized away, so https://ollama.example.com/api/ and https://ollama.example.com behave identically.
| Field | Type | Default | Meaning |
|---|---|---|---|
api_url | string | http://localhost:11434 | Ollama base URL; also REVKA_PROVIDER_URL env var |
OLLAMA_API_KEY | env var | none | Bearer token for protected/remote endpoints |
[runtime] reasoning_enabled | bool | unset | Forwarded to Ollama as think: true/false |
Authentication
Section titled “Authentication”For a local endpoint (localhost, 127.0.0.1, ::1), Revka never sends an Authorization header even if a key is set. For a remote endpoint, if OLLAMA_API_KEY (or api_key) is present, it is sent as a Bearer token.
Cloud models
Section titled “Cloud models”A model name ending in :cloud (for example llama3.2:cloud) requests Ollama’s cloud routing. This requires both a remote api_url and an API key — Revka fails fast if you request a :cloud model against a local endpoint or with no key configured.
Reasoning / thinking
Section titled “Reasoning / thinking”Set reasoning_enabled under [runtime] to forward Ollama’s think flag:
[runtime]reasoning_enabled = trueIt only takes effect on models that support reasoning. If a request with think: true fails (the model doesn’t support it), Revka automatically retries once with think omitted so the call still succeeds. Revka also strips <think>...</think> blocks from model output before returning text.
Vision
Section titled “Vision”Ollama supports vision. Embed an image in a user message with an image marker (see Vision support below); Revka extracts it and sends it in Ollama’s images array.
Local & self-hosted OpenAI-compatible servers
Section titled “Local & self-hosted OpenAI-compatible servers”LM Studio, llama.cpp, vLLM, SGLang, and Osaurus all speak the OpenAI /v1/chat/completions format and share Revka’s OpenAiCompatibleProvider implementation. Select one by its canonical ID:
| Provider | ID (aliases) | Default base URL | Auth |
|---|---|---|---|
| LM Studio | lmstudio (lm-studio) | http://localhost:1234/v1 | optional; default key lm-studio |
| llama.cpp | llamacpp (llama.cpp) | http://localhost:8080/v1 | optional (LLAMACPP_API_KEY); vision enabled |
| vLLM | vllm | http://localhost:8000/v1 | optional (VLLM_API_KEY) |
| SGLang | sglang | http://localhost:30000/v1 | optional (SGLANG_API_KEY) |
| Osaurus | osaurus | http://localhost:1337/v1 | optional; default key osaurus |
A minimal config for any of them:
default_provider = "vllm"default_model = "Qwen/Qwen2.5-7B-Instruct"All five accept an api_url override when your server runs on a non-default host or port — for example pointing at another machine on your LAN, or at a Docker host:
default_provider = "lmstudio"api_url = "http://host.docker.internal:1234/v1"default_model = "your-loaded-model"lmstudio, osaurus, and llamacpp fall back to placeholder keys (lm-studio, osaurus, and llama.cpp respectively) when you don’t set one, which satisfies servers that require some Bearer value but don’t validate it. For vllm and sglang, the key is optional and only sent when configured.
Vision support
Section titled “Vision support”| Backend | Vision |
|---|---|
| Ollama | Yes |
llama.cpp (llamacpp) | Yes |
custom: endpoints | Yes |
| LM Studio, vLLM, SGLang, Osaurus | Not enabled by default |
To send an image, embed an image marker in a user message. Two forms are supported:
Describe this chart [IMAGE:/path/to/file.png]Describe this chart [IMAGE:data:image/png;base64,iVBORw0KG...]Remote image URLs require allow_remote_fetch = true under [multimodal]. Image count and size are clamped (max_images 1–16, max_image_size_mb 1–20). See [multimodal] in Config: provider, agent & routing.
Model discovery
Section titled “Model discovery”Revka caches each provider’s model catalog on disk and refreshes it from the provider’s models endpoint.
revka models refresh # refresh the default providerrevka models refresh --provider vllm # refresh one providerrevka models list --provider vllm # print the cached catalogLive discovery is supported for ollama, llamacpp, sglang, vllm, and osaurus. lmstudio does not support live discovery — set its model manually:
revka models set your-loaded-modelTo probe connectivity and auth across every configured provider at once, use revka doctor models. Full command reference: revka models, providers & auth.
Custom endpoint (OpenAI-compatible)
Section titled “Custom endpoint (OpenAI-compatible)”When no named provider fits, point Revka at any OpenAI-compatible endpoint with the custom: prefix. The URL is part of the provider ID:
default_provider = "custom:https://your-api.example.com/v1"api_key = "your-key"The URL must use http:// or https://; an empty or invalid URL fails at startup with a clear error. Custom endpoints have vision enabled. Because the provider carries no known key prefix, the API key mismatch check is skipped for custom: providers.
You can also use a custom: URL anywhere a provider name is accepted, including reliability fallback chains:
[reliability]fallback_providers = ["custom:http://host.docker.internal:1234/v1", "anthropic"]anthropic-custom:
Section titled “anthropic-custom:”To target a custom Anthropic Messages API endpoint (for example a self-hosted Anthropic-compatible gateway), use the anthropic-custom: prefix. This routes through Revka’s native Anthropic provider against your base URL:
default_provider = "anthropic-custom:https://your-anthropic-compat.example.com"api_key = "your-key"The same URL validation applies, and the key-prefix mismatch check is skipped.
api_path and extra_headers
Section titled “api_path and extra_headers”Two tuning knobs help with non-standard or gateway endpoints. They apply to OpenAI-compatible providers, including custom: and the local servers above.
api_path
Section titled “api_path”By default, Revka appends /chat/completions to the base URL (unless the base URL already ends in /chat/completions, in which case it is used as-is). For an API that uses a different path, override it with api_path:
default_provider = "custom:https://your-gateway.example.com"api_path = "/v2/generate"When api_path is set it replaces the default /chat/completions suffix. A leading slash is optional — Revka inserts a separator if needed.
extra_headers
Section titled “extra_headers”Add HTTP headers sent with every provider request — useful for gateway routing, tenant identifiers, or headers like HTTP-Referer / X-Title. These augment and override Revka’s default headers.
[extra_headers]"X-Title" = "revka""X-Tenant-Id" = "team-42"REVKA_EXTRA_HEADERS
Section titled “REVKA_EXTRA_HEADERS”The same headers can be set via the REVKA_EXTRA_HEADERS environment variable using Key:Value,Key2:Value2 format. Env var headers override config-file headers of the same name.
export REVKA_EXTRA_HEADERS="X-Title:revka,X-Tenant-Id:team-42"Entries without a colon, or with an empty key, are skipped with a warning. Other provider-tuning keys you may pair with these (provider_timeout_secs, provider_max_tokens, [runtime] reasoning_effort) are documented in Routing, reliability & tuning.
Related pages
Section titled “Related pages”- Provider quickstart — pick a provider and run your first chat
- Provider catalog — every provider with base URLs, env vars, and capabilities
- Custom providers & local LLMs — task-oriented guide
- revka models, providers & auth — model catalog and provider CLI
- Config: provider, agent & routing — full config key reference