Skip to content

Updating, runbook & troubleshooting

Self-update and other update paths, day-2 operations, safe config rollout, health signals, and common failure fixes.

This page is the day-2 operator’s guide to a running Revka instance. It covers how to update the binary or container, the operations runbook for changing configuration safely, the health and state signals you watch in production, and a troubleshooting reference for the failures you are most likely to hit. Reach for it after the initial install — for first-time setup see Installation, and for the diagnostic commands referenced throughout, see revka doctor, status & self-test.

How you update depends on how you installed. Pick the path that matches your deployment.

For binaries installed from source or a prebuilt release, the built-in self-update command is the simplest path:

Terminal window
revka update # download and install the latest release
revka update --check # only report current vs latest, do not install
revka update --force # skip the confirmation prompt
revka update --version 0.6.0 # install a specific release
FlagMeaning
--checkCheck only — prints the current and latest versions and exits.
--forceInstall without the interactive confirmation prompt.
--version <X.Y.Z>Target a specific release instead of the latest.

revka update is not a naïve overwrite. It runs a six-phase pipeline that backs up the current binary and rolls back automatically if anything goes wrong, so a failed update never leaves you without a working binary.

  1. Preflight — resolve the target version and confirm an update is needed.
  2. Download — fetch the release binary for your platform from GitHub Releases.
  3. Backup — save the currently installed binary so it can be restored.
  4. Validate — verify the download’s SHA256 checksum and the optional cosign signature (the same verification chain install.sh uses).
  5. Swap — atomically replace the running binary with the new one.
  6. Smoke test — run the new binary to confirm it executes.

If the smoke test (or any earlier phase) fails, the pipeline rolls back to the backed-up binary automatically. --check runs the preflight comparison only and prints current versus latest without touching anything.

Revka runs in one of a few modes. Match your operational commands to how it is running.

ModeStartManage
Foreground autonomous runtimerevka daemonCtrl-C (SIGINT/SIGTERM)
Gateway only (dashboard + API)revka gatewayrevka gateway restart
OS background servicerevka service install && revka service startrevka service status / logs / restart
Containerdocker compose up -ddocker compose logs -f / restart

The full command reference lives in revka gateway, daemon & service; running as an OS service is covered in Run as a background service; containers in Docker, Compose & one-click PaaS.

Configuration lives in ~/.revka/config.toml. Treat every edit as a deployment and follow the validate → change → restart → verify → rollback loop.

  1. Validate first. Run revka doctor before you touch anything so you know the baseline is clean. The config category does real validation — it constructs the provider, resolves model and embedding routes, and checks the temperature range — so most mistakes surface here, not at runtime.
  2. Change one thing. Edit a single key (or one logical group) in config.toml. Smaller changes make rollback unambiguous.
  3. Restart to apply. Most config takes effect on restart: revka service restart (service), revka gateway restart (gateway), or docker compose restart (container).
  4. Verify. Run revka doctor again, then revka status to confirm the new value is in effect and no component went stale.
  5. Roll back if needed. Restore the previous config.toml and restart. Keep a backup before editing, or use revka onboard --reinit (which timestamps a backup of the whole config directory before starting fresh).

Revka exposes several independent liveness signals. Use the cheapest one that answers your question.

A process-global registry tracks the health of each subsystem. Components start in starting and are stamped ok or error as they run, along with the last-OK time, last error, and a restart count. Known component names follow a convention: gateway, scheduler, heartbeat, and channel:<name> (e.g. channel:telegram).

GET /health returns the full health snapshot from that registry. It needs no authentication and is the primary liveness signal for Docker HEALTHCHECK, load balancers, and monitoring systems.

GET http://localhost:42617/health
{
"pid": 12345,
"updated_at": "2026-06-18T12:00:00Z",
"uptime_seconds": 3600,
"components": {
"gateway": { "status": "ok", "updated_at": "…", "last_ok": "…", "restart_count": 0 },
"scheduler": { "status": "ok", "updated_at": "…", "last_ok": "…", "restart_count": 0 },
"channel:telegram": { "status": "error", "updated_at": "…", "last_error": "timeout", "restart_count": 2 }
}
}

For a scripted up/down check that maps health to an exit code (ideal for Docker HEALTHCHECK), use the CLI wrapper, which does a GET /health under the hood:

Terminal window
revka status --format exit-code # exit 0 if healthy, 1 otherwise

The daemon writes daemon_state.json next to config.toml every few seconds with a health snapshot of all components and a written_at timestamp. External monitors can watch this file: if it is older than 45 seconds, the daemon has died. On Windows it also doubles as the running-process check, since Task Scheduler cannot be queried reliably.

revka doctor reads this file and reports components as stale when their last update exceeds a fixed threshold:

ComponentStale after
Heartbeat30 s
Scheduler120 s
Channel300 s

If the state file is absent, the daemon is assumed not running. For metrics-based monitoring (GET /metrics, OTLP, runtime traces) see Observability & tracing.

Start every investigation with revka doctor — it validates config, probes the workspace, and checks daemon, scheduler, and channel freshness in one pass, so most problems surface there before you read a single log line.

  1. revka doctor — config validity, daemon/scheduler/channel freshness, sidecar and Kumiho backend status. This is the first step in any incident.
  2. revka status — confirm the instance is configured the way you think it is (provider, model, runtime kind, service state, cost, OTP/e-stop).
  3. GET /health — check which specific component is in error and its restart count.
  4. Logs — read the service logs for the failing component (see keyword triage below).
  5. revka doctor traces — when a specific turn or tool call misbehaves, inspect the runtime trace (requires tracing enabled).

For an OS-managed service, tail the logs directly:

Terminal window
revka service logs # last 50 lines (default)
revka service logs -n 100 --follow # tail and stream

Daemon logs live in the Homebrew var/revka/logs/ directory or <config_dir>/logs/, and rotate at 20 MB with 5 retained copies (daemon.stdout.log.1.5). Rotation runs on revka service start, not while the daemon is running. For a noisier picture during diagnosis, raise the log level:

Terminal window
RUST_LOG=debug revka daemon # or revka=trace for Revka-only tracing
SymptomLikely causeFix
Gateway refuses to bind 0.0.0.0 / [::]Public bind is guardedSet [gateway] allow_public_bind = true (or REVKA_ALLOW_PUBLIC_BIND=true), or front it with a tunnel. See Expose your gateway with a tunnel.
A component shows error with a rising restart_countSupervised component is crash-loopingRead its logs; the component supervisor backs off exponentially (reliability.channel_initial_backoff_secschannel_max_backoff_secs) before each restart.
revka doctor reports a channel/heartbeat/scheduler staleThe component stopped tickingCheck logs for that component; restart the daemon if the state file is stale (> 45 s).
Distroless container exits immediately or shell tools failThe :latest image has no shellUse ghcr.io/kumihoio/revka:debian. See Docker, Compose & one-click PaaS.
”Too many open files” with many MCP serversFile-descriptor limitThe service units set LimitNOFILE = 4096:8192; reinstall the service (revka service install) so the limit is applied.
doctor warns kumiho_memory is missingSidecar predates the kumiho_memory package~/.revka/kumiho/venv/bin/python -m pip install 'kumiho_memory>=0.5.0', then re-run revka doctor. See Kumiho setup.
Cargo / build failures during a source updateMissing toolchain or build deps./install.sh --install-rust and/or --install-system-deps; on low-RAM hosts use --prefer-prebuilt or CARGO_BUILD_JOBS=1.

When you do read the logs, grep for the keyword that matches your subsystem. Structured log fields follow a consistent naming pattern:

KeywordWhat it indicates
error / metric.errors_totalA component recorded a failure; pair with the component field.
tool.callTool execution start/finish — useful for tracing a stuck turn.
cache.hit / cache.missResponse-cache behavior (hot in-memory vs warm SQLite).
agent.start / agent.endAgent invocation boundaries.
heartbeatHeartbeat ticks and the dead-man’s switch.

For deeper, per-event inspection, enable rolling traces and query them. Note that trace payloads can include model output, so keep tracing off on shared hosts:

Terminal window
revka doctor traces --limit 50
revka doctor traces --event tool_call --contains "timeout"

See revka doctor, status & self-test for the full doctor reference and Observability & tracing for enabling traces.