Updating, runbook & troubleshooting
Self-update and other update paths, day-2 operations, safe config rollout, health signals, and common failure fixes.
This page is the day-2 operator’s guide to a running Revka instance. It covers how to update the binary or container, the operations runbook for changing configuration safely, the health and state signals you watch in production, and a troubleshooting reference for the failures you are most likely to hit. Reach for it after the initial install — for first-time setup see Installation, and for the diagnostic commands referenced throughout, see revka doctor, status & self-test.
Updating Revka
Section titled “Updating Revka”How you update depends on how you installed. Pick the path that matches your deployment.
For binaries installed from source or a prebuilt release, the built-in self-update command is the simplest path:
revka update # download and install the latest releaserevka update --check # only report current vs latest, do not installrevka update --force # skip the confirmation promptrevka update --version 0.6.0 # install a specific release| Flag | Meaning |
|---|---|
--check | Check only — prints the current and latest versions and exits. |
--force | Install without the interactive confirmation prompt. |
--version <X.Y.Z> | Target a specific release instead of the latest. |
If you installed with Homebrew (macOS or Linuxbrew):
brew upgrade revkaHomebrew keeps runtime data under its own var/revka/ directory and propagates REVKA_CONFIG_DIR into the launchd service, so config and workspace data survive the upgrade. See macOS update & uninstall.
For a source checkout managed by the bootstrap installer, pull and re-run:
git pull./install.sh --prefer-prebuilt # try a release binary, fall back to sourceAfter upgrading, seed any new bundled workflow templates without clobbering your own:
revka workflows sync # non-destructive; existing files untouchedSee Installation for the full install.sh flag set.
For containers, update the image rather than the binary. Do not re-run install.sh to restart a container.
# Composedocker compose pulldocker compose up -d
# Plain dockerdocker pull ghcr.io/kumihoio/revka:latestdocker stop revka && docker rm revkadocker run -d ... ghcr.io/kumihoio/revka:latestPin a CalVer tag (e.g. ghcr.io/kumihoio/revka:2026.4.21) instead of :latest when you need reproducible deployments. See Docker, Compose & one-click PaaS.
The revka update pipeline
Section titled “The revka update pipeline”revka update is not a naïve overwrite. It runs a six-phase pipeline that backs up the current binary and rolls back automatically if anything goes wrong, so a failed update never leaves you without a working binary.
- Preflight — resolve the target version and confirm an update is needed.
- Download — fetch the release binary for your platform from GitHub Releases.
- Backup — save the currently installed binary so it can be restored.
- Validate — verify the download’s SHA256 checksum and the optional cosign signature (the same verification chain
install.shuses). - Swap — atomically replace the running binary with the new one.
- Smoke test — run the new binary to confirm it executes.
If the smoke test (or any earlier phase) fails, the pipeline rolls back to the backed-up binary automatically. --check runs the preflight comparison only and prints current versus latest without touching anything.
Operations runbook
Section titled “Operations runbook”Revka runs in one of a few modes. Match your operational commands to how it is running.
| Mode | Start | Manage |
|---|---|---|
| Foreground autonomous runtime | revka daemon | Ctrl-C (SIGINT/SIGTERM) |
| Gateway only (dashboard + API) | revka gateway | revka gateway restart |
| OS background service | revka service install && revka service start | revka service status / logs / restart |
| Container | docker compose up -d | docker compose logs -f / restart |
The full command reference lives in revka gateway, daemon & service; running as an OS service is covered in Run as a background service; containers in Docker, Compose & one-click PaaS.
Safe config rollout
Section titled “Safe config rollout”Configuration lives in ~/.revka/config.toml. Treat every edit as a deployment and follow the validate → change → restart → verify → rollback loop.
- Validate first. Run
revka doctorbefore you touch anything so you know the baseline is clean. Theconfigcategory does real validation — it constructs the provider, resolves model and embedding routes, and checks the temperature range — so most mistakes surface here, not at runtime. - Change one thing. Edit a single key (or one logical group) in
config.toml. Smaller changes make rollback unambiguous. - Restart to apply. Most config takes effect on restart:
revka service restart(service),revka gateway restart(gateway), ordocker compose restart(container). - Verify. Run
revka doctoragain, thenrevka statusto confirm the new value is in effect and no component went stale. - Roll back if needed. Restore the previous
config.tomland restart. Keep a backup before editing, or userevka onboard --reinit(which timestamps a backup of the whole config directory before starting fresh).
Health and state signals
Section titled “Health and state signals”Revka exposes several independent liveness signals. Use the cheapest one that answers your question.
Component health registry
Section titled “Component health registry”A process-global registry tracks the health of each subsystem. Components start in starting and are stamped ok or error as they run, along with the last-OK time, last error, and a restart count. Known component names follow a convention: gateway, scheduler, heartbeat, and channel:<name> (e.g. channel:telegram).
Gateway health endpoint
Section titled “Gateway health endpoint”GET /health returns the full health snapshot from that registry. It needs no authentication and is the primary liveness signal for Docker HEALTHCHECK, load balancers, and monitoring systems.
GET http://localhost:42617/health{ "pid": 12345, "updated_at": "2026-06-18T12:00:00Z", "uptime_seconds": 3600, "components": { "gateway": { "status": "ok", "updated_at": "…", "last_ok": "…", "restart_count": 0 }, "scheduler": { "status": "ok", "updated_at": "…", "last_ok": "…", "restart_count": 0 }, "channel:telegram": { "status": "error", "updated_at": "…", "last_error": "timeout", "restart_count": 2 } }}For a scripted up/down check that maps health to an exit code (ideal for Docker HEALTHCHECK), use the CLI wrapper, which does a GET /health under the hood:
revka status --format exit-code # exit 0 if healthy, 1 otherwiseDaemon state file and freshness
Section titled “Daemon state file and freshness”The daemon writes daemon_state.json next to config.toml every few seconds with a health snapshot of all components and a written_at timestamp. External monitors can watch this file: if it is older than 45 seconds, the daemon has died. On Windows it also doubles as the running-process check, since Task Scheduler cannot be queried reliably.
revka doctor reads this file and reports components as stale when their last update exceeds a fixed threshold:
| Component | Stale after |
|---|---|
| Heartbeat | 30 s |
| Scheduler | 120 s |
| Channel | 300 s |
If the state file is absent, the daemon is assumed not running. For metrics-based monitoring (GET /metrics, OTLP, runtime traces) see Observability & tracing.
Troubleshooting
Section titled “Troubleshooting”Start every investigation with revka doctor — it validates config, probes the workspace, and checks daemon, scheduler, and channel freshness in one pass, so most problems surface there before you read a single log line.
Triage flow
Section titled “Triage flow”revka doctor— config validity, daemon/scheduler/channel freshness, sidecar and Kumiho backend status. This is the first step in any incident.revka status— confirm the instance is configured the way you think it is (provider, model, runtime kind, service state, cost, OTP/e-stop).GET /health— check which specific component is inerrorand its restart count.- Logs — read the service logs for the failing component (see keyword triage below).
revka doctor traces— when a specific turn or tool call misbehaves, inspect the runtime trace (requires tracing enabled).
Reading the logs
Section titled “Reading the logs”For an OS-managed service, tail the logs directly:
revka service logs # last 50 lines (default)revka service logs -n 100 --follow # tail and streamDaemon logs live in the Homebrew var/revka/logs/ directory or <config_dir>/logs/, and rotate at 20 MB with 5 retained copies (daemon.stdout.log.1 … .5). Rotation runs on revka service start, not while the daemon is running. For a noisier picture during diagnosis, raise the log level:
RUST_LOG=debug revka daemon # or revka=trace for Revka-only tracingCommon failures
Section titled “Common failures”| Symptom | Likely cause | Fix |
|---|---|---|
Gateway refuses to bind 0.0.0.0 / [::] | Public bind is guarded | Set [gateway] allow_public_bind = true (or REVKA_ALLOW_PUBLIC_BIND=true), or front it with a tunnel. See Expose your gateway with a tunnel. |
A component shows error with a rising restart_count | Supervised component is crash-looping | Read its logs; the component supervisor backs off exponentially (reliability.channel_initial_backoff_secs → channel_max_backoff_secs) before each restart. |
revka doctor reports a channel/heartbeat/scheduler stale | The component stopped ticking | Check logs for that component; restart the daemon if the state file is stale (> 45 s). |
| Distroless container exits immediately or shell tools fail | The :latest image has no shell | Use ghcr.io/kumihoio/revka:debian. See Docker, Compose & one-click PaaS. |
| ”Too many open files” with many MCP servers | File-descriptor limit | The service units set LimitNOFILE = 4096:8192; reinstall the service (revka service install) so the limit is applied. |
doctor warns kumiho_memory is missing | Sidecar predates the kumiho_memory package | ~/.revka/kumiho/venv/bin/python -m pip install 'kumiho_memory>=0.5.0', then re-run revka doctor. See Kumiho setup. |
| Cargo / build failures during a source update | Missing toolchain or build deps | ./install.sh --install-rust and/or --install-system-deps; on low-RAM hosts use --prefer-prebuilt or CARGO_BUILD_JOBS=1. |
Log keyword triage
Section titled “Log keyword triage”When you do read the logs, grep for the keyword that matches your subsystem. Structured log fields follow a consistent naming pattern:
| Keyword | What it indicates |
|---|---|
error / metric.errors_total | A component recorded a failure; pair with the component field. |
tool.call | Tool execution start/finish — useful for tracing a stuck turn. |
cache.hit / cache.miss | Response-cache behavior (hot in-memory vs warm SQLite). |
agent.start / agent.end | Agent invocation boundaries. |
heartbeat | Heartbeat ticks and the dead-man’s switch. |
For deeper, per-event inspection, enable rolling traces and query them. Note that trace payloads can include model output, so keep tracing off on shared hosts:
revka doctor traces --limit 50revka doctor traces --event tool_call --contains "timeout"See revka doctor, status & self-test for the full doctor reference and Observability & tracing for enabling traces.