Skip to content

Workflows & SOP overview

The two orchestration primitives — declarative YAML workflows and event-driven SOPs — and when to use each.

Revka has two orchestration primitives for running multi-step work, and they sit at different layers of the stack. Workflows are declarative YAML pipelines — a DAG of typed steps (LLM agents, shell, Python, conditionals, parallel fan-out, human approval) authored in the dashboard or the Gateway API, stored in Kumiho, and executed by the Python Operator MCP server. SOPs (Standard Operating Procedures) are event-driven procedures defined as SOP.toml + SOP.md files on disk, run by the Rust SopEngine, and triggered by MQTT, webhooks, cron, peripheral signals, or manual invocation.

Read this page first to choose between them. Both share the same audit-first principles — human approval gates, checkpointed state, and Kumiho-backed audit records — but they differ in where definitions live, what executes them, and how they start. If you already know which one you need, jump to Your first workflow or the SOP reference.

DimensionYAML WorkflowSOP
Definition formatYAMLSOP.toml (metadata + triggers) + SOP.md (steps)
Where storedKumiho (Revka/Workflows space) + on-disk YAMLOn disk under <workspace>/sops/<name>/
ExecutorOperator MCP (Python)SopEngine (Rust)
Authoring surfaceDashboard editor, Gateway API, MCP toolsText editor + revka sop CLI
How a run startsManual run, cron trigger, entity-event triggerMQTT, webhook, cron, peripheral, or manual
Step modelTyped DAG with branching, parallel, loopsNumbered sequential steps
Approval modelhuman_approval / human_input stepsExecution modes + per-step requires_confirmation
No-LLM mode— (steps drive agents/tools)deterministic = true (no LLM round-trips)
Validation6-pass validator on save/runrevka sop validate
AuditRuns persisted to Kumiho (Revka/WorkflowRuns)Memory backend under category sop

A workflow is a YAML document with typed inputs, ordered steps, optional triggers, and named outputs. The smallest useful workflow runs a single agent step:

name: hello-world # unique slug (required)
version: "1.0" # semantic version (required)
description: Say hello to a topic.
inputs:
- name: topic
type: string # string | number | boolean | list
required: true
default: ""
steps:
- id: greet
type: agent
agent:
agent_type: claude # claude or codex
role: researcher
prompt: "Say hello about ${inputs.topic}."
outputs:
- name: result
source: "${greet.output}"

Steps reference each other through ${...} variable interpolation and depends_on edges, which together form the execution DAG. The full set of step types includes agent, shell, python, email, notify, resolve, conditional, parallel, goto, output, human_approval, human_input, a2a, and the higher-level orchestration steps map_reduce, supervisor, group_chat, and handoff. See the step types reference for each one and the YAML reference for the complete schema.

The operator discovers workflow YAML from up to four locations, with later sources overriding earlier ones by slug:

PriorityPathPurpose
3 (highest).revka/workflows/Project-local overrides
2~/.revka/workflows/User-global workflows
1 (lowest).revka/operator_mcp/workflow/builtins/Shipped builtin defaults
FallbackKumiho space Revka/WorkflowsCloud-managed definitions

Saving a workflow from the dashboard persists YAML to disk at ~/.revka/workflows/{slug}.r{N}.yaml and registers a Kumiho artifact; the base {slug}.yaml is what the directory scanner loads. Definitions are stored as Kumiho items of kind workflow, with the YAML kept as a workflow.yaml artifact file rather than inline metadata to avoid size limits.

A workflow can auto-launch on a schedule or in response to an upstream entity event:

triggers:
- cron: "0 9 * * 1" # every Monday 9am
timezone: "America/Los_Angeles" # optional IANA timezone
- on_kind: "qs-arc-plan" # Kumiho entity kind (required)
on_tag: "ready" # revision tag (default: "ready")
input_map:
arc_kref: "${trigger.entity_kref}"

When a workflow with a cron trigger is saved to Kumiho, the gateway auto-registers a cron job in the Revka cron store; deprecating or deleting the workflow removes it. Entity-event triggers watch for revision.tagged events emitted by output steps, letting one workflow chain into the next. See Variables, expressions & triggers.

Before a definition is stored or run, the operator runs a 6-pass validator (duplicate step IDs, dependency references, cycle detection, per-step config, variable references, trigger validation). The dashboard validates on save and shows errors inline; the API rejects invalid definitions with HTTP 400. Workflow state is checkpointed to ~/.revka/workflow_checkpoints/{run_id}.json after each step, so failed runs can be retried from the first failed step. Run statuses are pending, running, paused, completed, failed, cancelled, and stale. See Runs, approvals & checkpoints.

A SOP is a directory containing a TOML manifest and an optional Markdown step list. Enable the subsystem in config.toml:

[sop]
enabled = true
sops_dir = "sops" # defaults to <workspace>/sops
default_execution_mode = "supervised"
~/.revka/workspace/sops/
deploy-prod/
SOP.toml # metadata + triggers (required)
SOP.md # procedure steps (optional)

SOP.toml declares identity, execution behavior, and triggers; SOP.md lists numbered steps with suggested tools and per-step flags:

[sop]
name = "deploy-prod"
description = "Deploy service to production"
version = "1.0.0"
priority = "high" # low | normal | high | critical
execution_mode = "supervised" # auto | supervised | step_by_step | priority_based | deterministic
cooldown_secs = 300
max_concurrent = 1
[[triggers]]
type = "webhook"
path = "/sop/deploy"
[[triggers]]
type = "manual"
## Steps
1. **Check readings** — Read sensor data and confirm.
- tools: gpio_read, kumiho_memory_store
2. **Close valve** — Set GPIO pin 5 LOW.
- tools: gpio_write
- requires_confirmation: true

SOPs fan in from five event sources — mqtt, webhook, cron, peripheral, and manual — and the execution mode controls how much autonomy the agent has per step:

ModeBehavior
autoExecute all steps without approval
supervisedApproval before the first step only
step_by_stepApproval before every step
priority_basedCritical/High → auto; Normal/Low → supervised
deterministicNo LLM round-trips; each step’s output is piped to the next; checkpoint steps pause

Setting deterministic = true forces deterministic mode regardless of execution_mode, and the engine tracks an llm_calls_saved metric for those runs. A per-step requires_confirmation: true overrides the mode and forces approval for that step. Run statuses are pending, running, waiting_approval, paused_checkpoint, completed, failed, and cancelled. See the SOP reference.

The revka sop subcommand manages definitions only — there is no revka sop run. Runs start from an event source or the in-agent sop_execute tool.

Terminal window
revka sop list # list all loaded SOPs with triggers and mode
revka sop validate # validate all SOPs
revka sop validate <name> # validate a specific SOP
revka sop show <name> # detailed view of a single SOP

In-agent tools advance a run: sop_execute <name> to start one, sop_status <run_id> to inspect it, sop_approve <run_id> to clear an approval gate, and sop_advance <run_id> <result> to report a step result. SOP run starts are recorded to the configured Memory backend under category sop.

YAML workflows do not run inside the Rust gateway — they run on the Operator MCP server, a Python 3.11+ MCP sidecar spawned per agent chat session by the daemon. The operator validates, executes, and persists workflows, and it also exposes them as MCP tools the agent can call directly:

ToolPurpose
run_workflow(workflow, inputs, cwd, run_id?, max_cost_usd?)Execute a workflow by name or inline definition
validate_workflow(...)Run the 6-pass validator; returns {valid, errors, warnings}
dry_run_workflow(...)Preview execution order, step count, and cost estimate
get_workflow_status(run_id, include_outputs?)Per-step results for a run
resume_workflow(run_id, approved?, cwd?)Resume after a human_approval step
retry_workflow(run_id, cwd?)Retry from the first failed step
cancel_workflow(run_id)Cancel a running or paused workflow

The Gateway API is a thin front for this: dispatching POST /api/workflows/run/{name} writes a durable workflow-run-request item in Kumiho and then immediately calls the operator’s run_workflow MCP tool for fast startup, falling back to the Kumiho item if the direct call fails. A max_cost_usd cap on a run aborts it if exceeded. For the full tool surface and install steps, see Operator MCP.

The two orchestrators occupy different layers but share storage and audit:

ConcernYAML Workflow pathSOP path
AuthorDashboard / Gateway API / MCP toolsEditor + revka sop CLI
DefineYAML in Kumiho + diskSOP.toml + SOP.md on disk
TriggerManual run, cron, entity eventMQTT, webhook, cron, peripheral, manual
ExecuteOperator MCP (Python) over the DAGSopEngine (Rust), sequential
Pausehuman_approval / human_input stepsApproval gates / deterministic checkpoints
PersistCheckpoints on disk; runs in KumihoRun state + audit in the Memory backend

The gateway overlays local checkpoint and lock files on top of Kumiho metadata to report live run status. SOP webhook ingress has two paths: POST /sop/{*rest} matches SOPs only (no LLM fallback), while POST /webhook tries a SOP first and falls back to the LLM agent if none matches. For the end-to-end runtime picture, see How Revka works and the Gateway Workflows & Architect API.

When Revka instances coordinate across machines, they talk over the node transport: standard HTTPS on port 443 authenticated with HMAC-SHA256, deliberately chosen so it passes through corporate proxies, firewalls, and IT audit expectations with no exotic protocols. Every outgoing request to a peer’s /api/node-control/{endpoint} carries three headers:

HeaderValue
X-Revka-TimestampUnix epoch seconds
X-Revka-NonceRandom UUID v4
X-Revka-SignatureHMAC-SHA256 hex digest of the request

The signature is computed over the little-endian timestamp bytes, the nonce bytes, and the payload bytes, keyed by a shared secret. The receiver recomputes the digest and compares it in constant time to resist timing attacks, and rejects any request whose timestamp is outside a 300-second replay window. Because the secret is shared and never transmitted, a tampered payload, a wrong key, or a stale timestamp all fail verification.