Updating, runbook & troubleshooting

Self-update and other update paths, day-2 operations, safe config rollout, health signals, and common failure fixes.

This page is the day-2 operator’s guide to a running Revka instance. It covers how to update the binary or container, the operations runbook for changing configuration safely, the health and state signals you watch in production, and a troubleshooting reference for the failures you are most likely to hit. Reach for it after the initial install — for first-time setup see Installation, and for the diagnostic commands referenced throughout, see revka doctor, status & self-test.

Updating Revka

How you update depends on how you installed. Pick the path that matches your deployment.

For binaries installed from source or a prebuilt release, the built-in self-update command is the simplest path:

revka update                  # download and install the latest release
revka update --check          # only report current vs latest, do not install
revka update --force          # skip the confirmation prompt
revka update --version 0.6.0  # install a specific release

Flag	Meaning
`--check`	Check only — prints the current and latest versions and exits.
`--force`	Install without the interactive confirmation prompt.
`--version <X.Y.Z>`	Target a specific release instead of the latest.

If you installed with Homebrew (macOS or Linuxbrew):

brew upgrade revka

Homebrew keeps runtime data under its own var/revka/ directory and propagates REVKA_CONFIG_DIR into the launchd service, so config and workspace data survive the upgrade. See macOS update & uninstall.

For a source checkout managed by the bootstrap installer, pull and re-run:

git pull
./install.sh --prefer-prebuilt     # try a release binary, fall back to source

After upgrading, seed any new bundled workflow templates without clobbering your own:

revka workflows sync               # non-destructive; existing files untouched

See Installation for the full install.sh flag set.

For containers, update the image rather than the binary. Do not re-run install.sh to restart a container.

# Compose
docker compose pull
docker compose up -d

# Plain docker
docker pull ghcr.io/kumihoio/revka:latest
docker stop revka && docker rm revka
docker run -d ... ghcr.io/kumihoio/revka:latest

Pin a CalVer tag (e.g. ghcr.io/kumihoio/revka:2026.4.21) instead of :latest when you need reproducible deployments. See Docker, Compose & one-click PaaS.

The `revka update` pipeline

revka update is not a naïve overwrite. It runs a six-phase pipeline that backs up the current binary and rolls back automatically if anything goes wrong, so a failed update never leaves you without a working binary.

Preflight — resolve the target version and confirm an update is needed.
Download — fetch the release binary for your platform from GitHub Releases.
Backup — save the currently installed binary so it can be restored.
Validate — verify the download’s SHA256 checksum and the optional cosign signature (the same verification chain install.sh uses).
Swap — atomically replace the running binary with the new one.
Smoke test — run the new binary to confirm it executes.

If the smoke test (or any earlier phase) fails, the pipeline rolls back to the backed-up binary automatically. --check runs the preflight comparison only and prints current versus latest without touching anything.

Whatever path you used, confirm the upgrade landed and the install is still sound:

revka --version
revka self-test --quick     # offline checks: config, registries, local storage

Run the full revka self-test once the daemon is back up to add the gateway-health and memory round-trip checks. See revka doctor, status & self-test.

Operations runbook

Revka runs in one of a few modes. Match your operational commands to how it is running.

Mode	Start	Manage
Foreground autonomous runtime	`revka daemon`	`Ctrl-C` (SIGINT/SIGTERM)
Gateway only (dashboard + API)	`revka gateway`	`revka gateway restart`
OS background service	`revka service install && revka service start`	`revka service status` / `logs` / `restart`
Container	`docker compose up -d`	`docker compose logs -f` / `restart`

The full command reference lives in revka gateway, daemon & service; running as an OS service is covered in Run as a background service; containers in Docker, Compose & one-click PaaS.

Safe config rollout

Configuration lives in ~/.revka/config.toml. Treat every edit as a deployment and follow the validate → change → restart → verify → rollback loop.

Validate first. Run revka doctor before you touch anything so you know the baseline is clean. The config category does real validation — it constructs the provider, resolves model and embedding routes, and checks the temperature range — so most mistakes surface here, not at runtime.
Change one thing. Edit a single key (or one logical group) in config.toml. Smaller changes make rollback unambiguous.
Restart to apply. Most config takes effect on restart: revka service restart (service), revka gateway restart (gateway), or docker compose restart (container).
Verify. Run revka doctor again, then revka status to confirm the new value is in effect and no component went stale.
Roll back if needed. Restore the previous config.toml and restart. Keep a backup before editing, or use revka onboard --reinit (which timestamps a backup of the whole config directory before starting fresh).

Health and state signals

Revka exposes several independent liveness signals. Use the cheapest one that answers your question.

Component health registry

A process-global registry tracks the health of each subsystem. Components start in starting and are stamped ok or error as they run, along with the last-OK time, last error, and a restart count. Known component names follow a convention: gateway, scheduler, heartbeat, and channel:<name> (e.g. channel:telegram).

Gateway health endpoint

GET /health returns the full health snapshot from that registry. It needs no authentication and is the primary liveness signal for Docker HEALTHCHECK, load balancers, and monitoring systems.

GET http://localhost:42617/health

{
  "pid": 12345,
  "updated_at": "2026-06-18T12:00:00Z",
  "uptime_seconds": 3600,
  "components": {
    "gateway":          { "status": "ok",    "updated_at": "…", "last_ok": "…", "restart_count": 0 },
    "scheduler":        { "status": "ok",    "updated_at": "…", "last_ok": "…", "restart_count": 0 },
    "channel:telegram": { "status": "error", "updated_at": "…", "last_error": "timeout", "restart_count": 2 }
  }
}

For a scripted up/down check that maps health to an exit code (ideal for Docker HEALTHCHECK), use the CLI wrapper, which does a GET /health under the hood:

revka status --format exit-code   # exit 0 if healthy, 1 otherwise

Daemon state file and freshness

The daemon writes daemon_state.json next to config.toml every few seconds with a health snapshot of all components and a written_at timestamp. External monitors can watch this file: if it is older than 45 seconds, the daemon has died. On Windows it also doubles as the running-process check, since Task Scheduler cannot be queried reliably.

revka doctor reads this file and reports components as stale when their last update exceeds a fixed threshold:

Component	Stale after
Heartbeat	30 s
Scheduler	120 s
Channel	300 s

If the state file is absent, the daemon is assumed not running. For metrics-based monitoring (GET /metrics, OTLP, runtime traces) see Observability & tracing.

Troubleshooting

Start every investigation with revka doctor — it validates config, probes the workspace, and checks daemon, scheduler, and channel freshness in one pass, so most problems surface there before you read a single log line.

Triage flow

revka doctor — config validity, daemon/scheduler/channel freshness, sidecar and Kumiho backend status. This is the first step in any incident.
revka status — confirm the instance is configured the way you think it is (provider, model, runtime kind, service state, cost, OTP/e-stop).
GET /health — check which specific component is in error and its restart count.
Logs — read the service logs for the failing component (see keyword triage below).
revka doctor traces — when a specific turn or tool call misbehaves, inspect the runtime trace (requires tracing enabled).

Reading the logs

For an OS-managed service, tail the logs directly:

revka service logs                  # last 50 lines (default)
revka service logs -n 100 --follow  # tail and stream

Daemon logs live in the Homebrew var/revka/logs/ directory or <config_dir>/logs/, and rotate at 20 MB with 5 retained copies (daemon.stdout.log.1 … .5). Rotation runs on revka service start, not while the daemon is running. For a noisier picture during diagnosis, raise the log level:

RUST_LOG=debug revka daemon          # or revka=trace for Revka-only tracing

Common failures

Symptom	Likely cause	Fix
Gateway refuses to bind `0.0.0.0` / `[::]`	Public bind is guarded	Set `[gateway] allow_public_bind = true` (or `REVKA_ALLOW_PUBLIC_BIND=true`), or front it with a tunnel. See Expose your gateway with a tunnel.
A component shows `error` with a rising `restart_count`	Supervised component is crash-looping	Read its logs; the component supervisor backs off exponentially (`reliability.channel_initial_backoff_secs` → `channel_max_backoff_secs`) before each restart.
`revka doctor` reports a channel/heartbeat/scheduler stale	The component stopped ticking	Check logs for that component; restart the daemon if the state file is stale (> 45 s).
Distroless container exits immediately or shell tools fail	The `:latest` image has no shell	Use `ghcr.io/kumihoio/revka:debian`. See Docker, Compose & one-click PaaS.
”Too many open files” with many MCP servers	File-descriptor limit	The service units set `LimitNOFILE = 4096:8192`; reinstall the service (`revka service install`) so the limit is applied.
`doctor` warns `kumiho_memory` is missing	Sidecar predates the `kumiho_memory` package	`~/.revka/kumiho/venv/bin/python -m pip install 'kumiho_memory>=0.5.0'`, then re-run `revka doctor`. See Kumiho setup.
Cargo / build failures during a source update	Missing toolchain or build deps	`./install.sh --install-rust` and/or `--install-system-deps`; on low-RAM hosts use `--prefer-prebuilt` or `CARGO_BUILD_JOBS=1`.

Log keyword triage

When you do read the logs, grep for the keyword that matches your subsystem. Structured log fields follow a consistent naming pattern:

Keyword	What it indicates
`error` / `metric.errors_total`	A component recorded a failure; pair with the `component` field.
`tool.call`	Tool execution start/finish — useful for tracing a stuck turn.
`cache.hit` / `cache.miss`	Response-cache behavior (`hot` in-memory vs `warm` SQLite).
`agent.start` / `agent.end`	Agent invocation boundaries.
`heartbeat`	Heartbeat ticks and the dead-man’s switch.

For deeper, per-event inspection, enable rolling traces and query them. Note that trace payloads can include model output, so keep tracing off on shared hosts:

revka doctor traces --limit 50
revka doctor traces --event tool_call --contains "timeout"

See revka doctor, status & self-test for the full doctor reference and Observability & tracing for enabling traces.

revka doctor, status & self-test The diagnostic commands behind every triage step on this page.

Observability & tracing Metrics backends, the /metrics endpoint, and runtime trace storage.

Run as a background service Install and manage Revka as an OS service across launchd, systemd, OpenRC, and Windows.

Docker, Compose & one-click PaaS Container images, Compose, HEALTHCHECK, and the PaaS templates.

revka install, update, migrate, completions & ACP The lifecycle commands, including revka update flags.

Status, health, config & tools endpoints The GET /health and GET /api/health endpoints in full.

Updating, runbook & troubleshooting

Updating Revka

The `revka update` pipeline

Operations runbook

Safe config rollout

Health and state signals

Component health registry

Gateway health endpoint

Daemon state file and freshness

Troubleshooting

Triage flow

Reading the logs

Common failures

Log keyword triage

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Updating, runbook & troubleshooting

Updating Revka

The revka update pipeline

Operations runbook

Safe config rollout

Health and state signals

Component health registry

Gateway health endpoint

Daemon state file and freshness

Troubleshooting

Triage flow

Reading the logs

Common failures

Log keyword triage

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

The `revka update` pipeline