Skip to content

Local, self-hosted & custom endpoints

Ollama, llama.cpp, LM Studio, vLLM, SGLang, Osaurus, and the custom:/anthropic-custom: endpoint prefixes.

This page covers running Revka against a model server you host yourself — Ollama, llama.cpp, LM Studio, vLLM, SGLang, and Osaurus — plus the two “bring your own endpoint” prefixes (custom: and anthropic-custom:) for any OpenAI- or Anthropic-compatible API. Use these when you want local inference, an air-gapped deployment, a self-hosted GPU box on your network, or a gateway/proxy that the named providers don’t cover.

For the full provider table and credential resolution order, see the Provider catalog. For slow-LLM tuning (timeouts, loop detection), see Routing, reliability & tuning and the [pacing] section in Config: provider, agent & routing.

Ollama is a first-party provider. It talks to Ollama’s native /api/chat endpoint (not /v1/chat/completions), and supports vision, streaming, and an optional thinking toggle.

default_provider = "ollama"
default_model = "llama3.2"

The default base URL is http://localhost:11434. Override it for a remote instance with api_url (or the REVKA_PROVIDER_URL environment variable):

default_provider = "ollama"
api_url = "http://10.0.0.1:11434"
default_model = "llama3.2"

A trailing /api or /api/chat in api_url is automatically normalized away, so https://ollama.example.com/api/ and https://ollama.example.com behave identically.

FieldTypeDefaultMeaning
api_urlstringhttp://localhost:11434Ollama base URL; also REVKA_PROVIDER_URL env var
OLLAMA_API_KEYenv varnoneBearer token for protected/remote endpoints
[runtime] reasoning_enabledboolunsetForwarded to Ollama as think: true/false

For a local endpoint (localhost, 127.0.0.1, ::1), Revka never sends an Authorization header even if a key is set. For a remote endpoint, if OLLAMA_API_KEY (or api_key) is present, it is sent as a Bearer token.

A model name ending in :cloud (for example llama3.2:cloud) requests Ollama’s cloud routing. This requires both a remote api_url and an API key — Revka fails fast if you request a :cloud model against a local endpoint or with no key configured.

Set reasoning_enabled under [runtime] to forward Ollama’s think flag:

[runtime]
reasoning_enabled = true

It only takes effect on models that support reasoning. If a request with think: true fails (the model doesn’t support it), Revka automatically retries once with think omitted so the call still succeeds. Revka also strips <think>...</think> blocks from model output before returning text.

Ollama supports vision. Embed an image in a user message with an image marker (see Vision support below); Revka extracts it and sends it in Ollama’s images array.

Local & self-hosted OpenAI-compatible servers

Section titled “Local & self-hosted OpenAI-compatible servers”

LM Studio, llama.cpp, vLLM, SGLang, and Osaurus all speak the OpenAI /v1/chat/completions format and share Revka’s OpenAiCompatibleProvider implementation. Select one by its canonical ID:

ProviderID (aliases)Default base URLAuth
LM Studiolmstudio (lm-studio)http://localhost:1234/v1optional; default key lm-studio
llama.cppllamacpp (llama.cpp)http://localhost:8080/v1optional (LLAMACPP_API_KEY); vision enabled
vLLMvllmhttp://localhost:8000/v1optional (VLLM_API_KEY)
SGLangsglanghttp://localhost:30000/v1optional (SGLANG_API_KEY)
Osaurusosaurushttp://localhost:1337/v1optional; default key osaurus

A minimal config for any of them:

default_provider = "vllm"
default_model = "Qwen/Qwen2.5-7B-Instruct"

All five accept an api_url override when your server runs on a non-default host or port — for example pointing at another machine on your LAN, or at a Docker host:

default_provider = "lmstudio"
api_url = "http://host.docker.internal:1234/v1"
default_model = "your-loaded-model"

lmstudio, osaurus, and llamacpp fall back to placeholder keys (lm-studio, osaurus, and llama.cpp respectively) when you don’t set one, which satisfies servers that require some Bearer value but don’t validate it. For vllm and sglang, the key is optional and only sent when configured.

BackendVision
OllamaYes
llama.cpp (llamacpp)Yes
custom: endpointsYes
LM Studio, vLLM, SGLang, OsaurusNot enabled by default

To send an image, embed an image marker in a user message. Two forms are supported:

Describe this chart [IMAGE:/path/to/file.png]
Describe this chart [IMAGE:data:image/png;base64,iVBORw0KG...]

Remote image URLs require allow_remote_fetch = true under [multimodal]. Image count and size are clamped (max_images 1–16, max_image_size_mb 1–20). See [multimodal] in Config: provider, agent & routing.

Revka caches each provider’s model catalog on disk and refreshes it from the provider’s models endpoint.

Terminal window
revka models refresh # refresh the default provider
revka models refresh --provider vllm # refresh one provider
revka models list --provider vllm # print the cached catalog

Live discovery is supported for ollama, llamacpp, sglang, vllm, and osaurus. lmstudio does not support live discovery — set its model manually:

Terminal window
revka models set your-loaded-model

To probe connectivity and auth across every configured provider at once, use revka doctor models. Full command reference: revka models, providers & auth.

When no named provider fits, point Revka at any OpenAI-compatible endpoint with the custom: prefix. The URL is part of the provider ID:

default_provider = "custom:https://your-api.example.com/v1"
api_key = "your-key"

The URL must use http:// or https://; an empty or invalid URL fails at startup with a clear error. Custom endpoints have vision enabled. Because the provider carries no known key prefix, the API key mismatch check is skipped for custom: providers.

You can also use a custom: URL anywhere a provider name is accepted, including reliability fallback chains:

[reliability]
fallback_providers = ["custom:http://host.docker.internal:1234/v1", "anthropic"]

To target a custom Anthropic Messages API endpoint (for example a self-hosted Anthropic-compatible gateway), use the anthropic-custom: prefix. This routes through Revka’s native Anthropic provider against your base URL:

default_provider = "anthropic-custom:https://your-anthropic-compat.example.com"
api_key = "your-key"

The same URL validation applies, and the key-prefix mismatch check is skipped.

Two tuning knobs help with non-standard or gateway endpoints. They apply to OpenAI-compatible providers, including custom: and the local servers above.

By default, Revka appends /chat/completions to the base URL (unless the base URL already ends in /chat/completions, in which case it is used as-is). For an API that uses a different path, override it with api_path:

default_provider = "custom:https://your-gateway.example.com"
api_path = "/v2/generate"

When api_path is set it replaces the default /chat/completions suffix. A leading slash is optional — Revka inserts a separator if needed.

Add HTTP headers sent with every provider request — useful for gateway routing, tenant identifiers, or headers like HTTP-Referer / X-Title. These augment and override Revka’s default headers.

[extra_headers]
"X-Title" = "revka"
"X-Tenant-Id" = "team-42"

The same headers can be set via the REVKA_EXTRA_HEADERS environment variable using Key:Value,Key2:Value2 format. Env var headers override config-file headers of the same name.

Terminal window
export REVKA_EXTRA_HEADERS="X-Title:revka,X-Tenant-Id:team-42"

Entries without a colon, or with an empty key, are skipped with a warning. Other provider-tuning keys you may pair with these (provider_timeout_secs, provider_max_tokens, [runtime] reasoning_effort) are documented in Routing, reliability & tuning.