Local, self-hosted & custom endpoints

Ollama, llama.cpp, LM Studio, vLLM, SGLang, Osaurus, and the custom:/anthropic-custom: endpoint prefixes.

This page covers running Revka against a model server you host yourself — Ollama, llama.cpp, LM Studio, vLLM, SGLang, and Osaurus — plus the two “bring your own endpoint” prefixes (custom: and anthropic-custom:) for any OpenAI- or Anthropic-compatible API. Use these when you want local inference, an air-gapped deployment, a self-hosted GPU box on your network, or a gateway/proxy that the named providers don’t cover.

For the full provider table and credential resolution order, see the Provider catalog. For slow-LLM tuning (timeouts, loop detection), see Routing, reliability & tuning and the [pacing] section in Config: provider, agent & routing.

Ollama

Ollama is a first-party provider. It talks to Ollama’s native /api/chat endpoint (not /v1/chat/completions), and supports vision, streaming, and an optional thinking toggle.

default_provider = "ollama"
default_model    = "llama3.2"

The default base URL is http://localhost:11434. Override it for a remote instance with api_url (or the REVKA_PROVIDER_URL environment variable):

default_provider = "ollama"
api_url          = "http://10.0.0.1:11434"
default_model    = "llama3.2"

A trailing /api or /api/chat in api_url is automatically normalized away, so https://ollama.example.com/api/ and https://ollama.example.com behave identically.

Field	Type	Default	Meaning
`api_url`	string	`http://localhost:11434`	Ollama base URL; also `REVKA_PROVIDER_URL` env var
`OLLAMA_API_KEY`	env var	none	Bearer token for protected/remote endpoints
`[runtime] reasoning_enabled`	bool	unset	Forwarded to Ollama as `think: true`/`false`

Authentication

For a local endpoint (localhost, 127.0.0.1, ::1), Revka never sends an Authorization header even if a key is set. For a remote endpoint, if OLLAMA_API_KEY (or api_key) is present, it is sent as a Bearer token.

Cloud models

A model name ending in :cloud (for example llama3.2:cloud) requests Ollama’s cloud routing. This requires both a remote api_url and an API key — Revka fails fast if you request a :cloud model against a local endpoint or with no key configured.

Reasoning / thinking

Set reasoning_enabled under [runtime] to forward Ollama’s think flag:

[runtime]
reasoning_enabled = true

It only takes effect on models that support reasoning. If a request with think: true fails (the model doesn’t support it), Revka automatically retries once with think omitted so the call still succeeds. Revka also strips <think>...</think> blocks from model output before returning text.

Vision

Ollama supports vision. Embed an image in a user message with an image marker (see Vision support below); Revka extracts it and sends it in Ollama’s images array.

Local & self-hosted OpenAI-compatible servers

LM Studio, llama.cpp, vLLM, SGLang, and Osaurus all speak the OpenAI /v1/chat/completions format and share Revka’s OpenAiCompatibleProvider implementation. Select one by its canonical ID:

Provider	ID (aliases)	Default base URL	Auth
LM Studio	`lmstudio` (`lm-studio`)	`http://localhost:1234/v1`	optional; default key `lm-studio`
llama.cpp	`llamacpp` (`llama.cpp`)	`http://localhost:8080/v1`	optional (`LLAMACPP_API_KEY`); vision enabled
vLLM	`vllm`	`http://localhost:8000/v1`	optional (`VLLM_API_KEY`)
SGLang	`sglang`	`http://localhost:30000/v1`	optional (`SGLANG_API_KEY`)
Osaurus	`osaurus`	`http://localhost:1337/v1`	optional; default key `osaurus`

A minimal config for any of them:

default_provider = "vllm"
default_model    = "Qwen/Qwen2.5-7B-Instruct"

All five accept an api_url override when your server runs on a non-default host or port — for example pointing at another machine on your LAN, or at a Docker host:

default_provider = "lmstudio"
api_url          = "http://host.docker.internal:1234/v1"
default_model    = "your-loaded-model"

lmstudio, osaurus, and llamacpp fall back to placeholder keys (lm-studio, osaurus, and llama.cpp respectively) when you don’t set one, which satisfies servers that require some Bearer value but don’t validate it. For vllm and sglang, the key is optional and only sent when configured.

Vision support

Backend	Vision
Ollama	Yes
llama.cpp (`llamacpp`)	Yes
`custom:` endpoints	Yes
LM Studio, vLLM, SGLang, Osaurus	Not enabled by default

To send an image, embed an image marker in a user message. Two forms are supported:

Describe this chart [IMAGE:/path/to/file.png]
Describe this chart [IMAGE:data:image/png;base64,iVBORw0KG...]

Remote image URLs require allow_remote_fetch = true under [multimodal]. Image count and size are clamped (max_images 1–16, max_image_size_mb 1–20). See [multimodal] in Config: provider, agent & routing.

Model discovery

Revka caches each provider’s model catalog on disk and refreshes it from the provider’s models endpoint.

revka models refresh                       # refresh the default provider
revka models refresh --provider vllm       # refresh one provider
revka models list --provider vllm          # print the cached catalog

Live discovery is supported for ollama, llamacpp, sglang, vllm, and osaurus. lmstudio does not support live discovery — set its model manually:

revka models set your-loaded-model

To probe connectivity and auth across every configured provider at once, use revka doctor models. Full command reference: revka models, providers & auth.

Custom endpoint (OpenAI-compatible)

When no named provider fits, point Revka at any OpenAI-compatible endpoint with the custom: prefix. The URL is part of the provider ID:

default_provider = "custom:https://your-api.example.com/v1"
api_key          = "your-key"

The URL must use http:// or https://; an empty or invalid URL fails at startup with a clear error. Custom endpoints have vision enabled. Because the provider carries no known key prefix, the API key mismatch check is skipped for custom: providers.

You can also use a custom: URL anywhere a provider name is accepted, including reliability fallback chains:

[reliability]
fallback_providers = ["custom:http://host.docker.internal:1234/v1", "anthropic"]

`anthropic-custom:`

To target a custom Anthropic Messages API endpoint (for example a self-hosted Anthropic-compatible gateway), use the anthropic-custom: prefix. This routes through Revka’s native Anthropic provider against your base URL:

default_provider = "anthropic-custom:https://your-anthropic-compat.example.com"
api_key          = "your-key"

The same URL validation applies, and the key-prefix mismatch check is skipped.

`api_path` and `extra_headers`

Two tuning knobs help with non-standard or gateway endpoints. They apply to OpenAI-compatible providers, including custom: and the local servers above.

`api_path`

By default, Revka appends /chat/completions to the base URL (unless the base URL already ends in /chat/completions, in which case it is used as-is). For an API that uses a different path, override it with api_path:

default_provider = "custom:https://your-gateway.example.com"
api_path         = "/v2/generate"

When api_path is set it replaces the default /chat/completions suffix. A leading slash is optional — Revka inserts a separator if needed.

`extra_headers`

Add HTTP headers sent with every provider request — useful for gateway routing, tenant identifiers, or headers like HTTP-Referer / X-Title. These augment and override Revka’s default headers.

[extra_headers]
"X-Title" = "revka"
"X-Tenant-Id" = "team-42"

`REVKA_EXTRA_HEADERS`

The same headers can be set via the REVKA_EXTRA_HEADERS environment variable using Key:Value,Key2:Value2 format. Env var headers override config-file headers of the same name.

export REVKA_EXTRA_HEADERS="X-Title:revka,X-Tenant-Id:team-42"

Entries without a colon, or with an empty key, are skipped with a warning. Other provider-tuning keys you may pair with these (provider_timeout_secs, provider_max_tokens, [runtime] reasoning_effort) are documented in Routing, reliability & tuning.

Provider quickstart — pick a provider and run your first chat
Provider catalog — every provider with base URLs, env vars, and capabilities
Custom providers & local LLMs — task-oriented guide
revka models, providers & auth — model catalog and provider CLI
Config: provider, agent & routing — full config key reference

Local, self-hosted & custom endpoints

Ollama

Authentication

Cloud models

Reasoning / thinking

Vision

Local & self-hosted OpenAI-compatible servers

Vision support

Model discovery

Custom endpoint (OpenAI-compatible)

`anthropic-custom:`

`api_path` and `extra_headers`

`api_path`

`extra_headers`

`REVKA_EXTRA_HEADERS`

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Local, self-hosted & custom endpoints

Ollama

Authentication

Cloud models

Reasoning / thinking

Vision

Local & self-hosted OpenAI-compatible servers

Vision support

Model discovery

Custom endpoint (OpenAI-compatible)

anthropic-custom:

api_path and extra_headers

api_path

extra_headers

REVKA_EXTRA_HEADERS

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

`anthropic-custom:`

`api_path` and `extra_headers`

`api_path`

`extra_headers`

`REVKA_EXTRA_HEADERS`