Skip to content

Prompt injection, leak detection & trust

Inbound prompt-injection defense, outbound credential-leak detection, domain trust scoring, and workspace isolation boundaries.

Revka watches both ends of every conversation. On the way in, PromptGuard scans channel messages for prompt-injection attempts before they reach the LLM. On the way out, the LeakDetector scans replies for credentials before they leave for a channel. Around that sit two behavioural controls: domain trust scoring, which downgrades autonomy in domains where the agent keeps making mistakes, and the workspace isolation boundary, which keeps multi-client deployments from bleeding tools, domains, or paths across tenants.

Use this page when you want content-level defenses in addition to the policy engine — heuristic filters that catch what an allowlist cannot, plus per-domain and per-workspace guardrails. These layers are additive: they sit alongside the autonomy policy, the command allowlist, and path sandboxing, and a request must pass all applicable layers to proceed.

PromptGuard scans inbound channel messages before they are routed to the LLM. It looks for six categories of prompt-injection attack and returns one of three outcomes: Safe, Suspicious (with the matched patterns and a score), or Blocked (with a reason). Pattern sets are compiled once at startup for efficiency. PromptGuard is a heuristic filter contributed from RustyClaw (MIT licensed).

CategoryWhat it catchesExample triggers
system_prompt_overrideAttempts to discard or replace your system prompt”ignore previous instructions”, “override system prompt”
role_confusionAttempts to reassign the model’s role”you are now”, “act as”, “pretend you’re”
tool_call_injectionRaw tool-call JSON smuggled into a messagetext containing "tool_calls" plus a {"type": structure
secret_extractionAttempts to exfiltrate keys or vault contents”show me all your API keys”, “dump vault”
command_injectionShell metacharacters in non-trivial contexts`, $(, &&, `
jailbreak_attemptKnown jailbreak framingsDAN mode, “enter developer mode”, base64 decode tricks

Each category contributes a score; the message score is the sum across categories normalized into the 0.01.0 range (divided by six, the category count). The detected patterns are reported with the result so you can see why a message tripped the guard.

PromptGuard is not currently configurable via config.toml. It runs with hardcoded defaults: action warn (log and allow the message) and a detection sensitivity of 0.7. The GuardAction and sensitivity values are not read from any config section — a [security.prompt_guard] block would be silently ignored.

Outbound credential leak detection (LeakDetector)

Section titled “Outbound credential leak detection (LeakDetector)”

The LeakDetector scans outbound content right before it is delivered to a channel. When it finds a likely credential, it replaces the value with a typed [REDACTED_*] placeholder so the secret never leaves the host. Like PromptGuard, it is contributed from RustyClaw (MIT licensed).

The detector recognizes a broad set of credential shapes:

  • Provider API keys — Stripe, OpenAI, Anthropic, Google, and GitHub tokens → [REDACTED_API_KEY]
  • AWS credentials — Access Key IDs and Secret Access Keys → [REDACTED_AWS_CREDENTIAL]
  • Generic secretspassword=, secret=, token= style patterns → [REDACTED_SECRET]
  • PEM private keys — RSA, EC, and OpenSSH key blocks → [REDACTED_PRIVATE_KEY]
  • JWTs[REDACTED_JWT]
  • Database connection URLs — Postgres, MySQL, MongoDB, Redis → [REDACTED_DATABASE_URL]
  • High-entropy tokens — mixed alpha-digit strings ≥ 24 chars whose Shannon entropy clears the threshold → [REDACTED_HIGH_ENTROPY_TOKEN]

To avoid false positives, URL path segments and media markers such as [IMAGE:...] and [VIDEO:...] are explicitly excluded before entropy scanning, so ordinary filesystem paths and attachment references do not trip redaction.

LeakDetector is not currently configurable via config.toml. It always runs at a fixed sensitivity of 0.7 — the [security.leak_detector] block is not parsed and would be silently ignored. At the fixed 0.7 sensitivity, the generic password=/secret=/token= patterns require a value above 0.5 to match, and the entropy threshold is 3.5 + 0.7 × 1.25 ≈ 4.375. The structured patterns (provider keys, AWS, PEM, JWT, database URLs) are always active regardless of sensitivity.

Trust scoring tracks how well the agent behaves per domain over time and automatically tightens autonomy where it underperforms. Each domain carries a floating-point score in [0.0, 1.0] (new domains start at initial_score, default 0.8). The score:

  • decays exponentially back toward the initial value over time (configurable half-life);
  • drops on correction events — user_override, quality_failure, and sop_deviation;
  • rises slightly on each success.

When a domain’s score falls below regression_threshold, Revka emits a RegressionAlert and downgrades autonomy by one tier for that domain: Full → Supervised, Supervised → ReadOnly, and ReadOnly stays ReadOnly. An agent that keeps making mistakes in a domain loses the right to act autonomously there until its score recovers.

[trust]
initial_score = 0.8
decay_half_life_days = 30
regression_threshold = 0.5
correction_penalty = 0.05
success_boost = 0.01
KeyTypeDefaultMeaning
initial_scoref640.8Starting trust score for a newly seen domain.
decay_half_life_daysf6430.0Half-life, in days, of the exponential decay back toward initial_score.
regression_thresholdf640.5Score below which a RegressionAlert fires and autonomy is downgraded one tier.
correction_penaltyf640.05Score deducted per correction event.
success_boostf640.01Score added per success event.

Scores are clamped to [0.0, 1.0]. A worked example: a domain starts at 0.80; three quality_failure corrections at 0.05 each take it to 0.65; a string of further corrections eventually crosses below 0.50, triggering a regression that drops a Full-autonomy domain to Supervised. Subsequent successes (+0.01 each) and time-based decay gradually pull the score back up.

Workspace isolation boundary (WorkspaceBoundary)

Section titled “Workspace isolation boundary (WorkspaceBoundary)”

The workspace boundary enforces per-workspace isolation when you run one Revka instance for multiple clients. When a workspace profile is active, each access check returns either Allow or Deny with a reason:

  • Tools in the profile’s tool_restrictions list are denied.
  • Domains not in the profile’s allowed_domains list are denied.
  • Paths that belong to another workspace under the workspaces base directory are denied — unless cross_workspace_search = true, which permits read-like cross-workspace access. Paths outside the workspaces base directory are not restricted by the boundary (the autonomy policy’s path sandboxing still applies).

This check is additive with the SecurityPolicy: both must pass, so a workspace can only ever narrow what the global policy already allows.

Enable isolation globally in [workspace], then define one profile per client. Each profile gets its own memory namespace, audit namespace, credential profile, domain allowlist, and tool restrictions.

[workspace]
enabled = true
active_workspace = "client_a"
workspaces_dir = "~/.revka/workspaces"
isolate_memory = true
isolate_secrets = true
isolate_audit = true
cross_workspace_search = false

Per-profile file at ~/.revka/workspaces/<name>/profile.toml:

name = "client_a"
allowed_domains = ["api.client-a.example.com"]
credential_profile = "client-a-creds"
memory_namespace = "client_a_mem"
audit_namespace = "client_a_audit"
tool_restrictions = ["shell"]
KeyTypeDefaultMeaning
enabledboolfalseMaster switch for workspace isolation.
active_workspacestringunsetName of the currently active profile.
workspaces_dirstring~/.revka/workspacesBase directory holding the per-profile subdirectories.
isolate_memorybooltrueUse a separate memory store per workspace.
isolate_secretsbooltrueUse a separate secrets namespace per workspace.
isolate_auditbooltrueWrite a separate audit log per workspace.
cross_workspace_searchboolfalseWhen true, allows read-like access across workspace paths.

Per-profile fields:

FieldMeaning
nameProfile name. Must be alphanumeric plus -/_; empty names and .. path traversal are rejected.
allowed_domainsDomain allowlist for this workspace; domains outside it are denied.
tool_restrictionsTool names blocked in this workspace.
memory_namespaceScopes this workspace’s memory access.
audit_namespaceScopes this workspace’s audit-log entries.
credential_profileCredential set this workspace uses. Redacted (***) on profile export.