Prompt injection, leak detection & trust

Inbound prompt-injection defense, outbound credential-leak detection, domain trust scoring, and workspace isolation boundaries.

Revka watches both ends of every conversation. On the way in, PromptGuard scans channel messages for prompt-injection attempts before they reach the LLM. On the way out, the LeakDetector scans replies for credentials before they leave for a channel. Around that sit two behavioural controls: domain trust scoring, which downgrades autonomy in domains where the agent keeps making mistakes, and the workspace isolation boundary, which keeps multi-client deployments from bleeding tools, domains, or paths across tenants.

Use this page when you want content-level defenses in addition to the policy engine — heuristic filters that catch what an allowlist cannot, plus per-domain and per-workspace guardrails. These layers are additive: they sit alongside the autonomy policy, the command allowlist, and path sandboxing, and a request must pass all applicable layers to proceed.

Prompt injection defense (PromptGuard)

PromptGuard scans inbound channel messages before they are routed to the LLM. It looks for six categories of prompt-injection attack and returns one of three outcomes: Safe, Suspicious (with the matched patterns and a score), or Blocked (with a reason). Pattern sets are compiled once at startup for efficiency. PromptGuard is a heuristic filter contributed from RustyClaw (MIT licensed).

Detection categories

Category	What it catches	Example triggers
`system_prompt_override`	Attempts to discard or replace your system prompt	”ignore previous instructions”, “override system prompt”
`role_confusion`	Attempts to reassign the model’s role	”you are now”, “act as”, “pretend you’re”
`tool_call_injection`	Raw tool-call JSON smuggled into a message	text containing `"tool_calls"` plus a `{"type":` structure
`secret_extraction`	Attempts to exfiltrate keys or vault contents	”show me all your API keys”, “dump vault”
`command_injection`	Shell metacharacters in non-trivial contexts	`, `$(`, `&&`, `
`jailbreak_attempt`	Known jailbreak framings	DAN mode, “enter developer mode”, base64 decode tricks

Each category contributes a score; the message score is the sum across categories normalized into the 0.0–1.0 range (divided by six, the category count). The detected patterns are reported with the result so you can see why a message tripped the guard.

PromptGuard configuration

PromptGuard is not currently configurable via config.toml. It runs with hardcoded defaults: action warn (log and allow the message) and a detection sensitivity of 0.7. The GuardAction and sensitivity values are not read from any config section — a [security.prompt_guard] block would be silently ignored.

Outbound credential leak detection (LeakDetector)

The LeakDetector scans outbound content right before it is delivered to a channel. When it finds a likely credential, it replaces the value with a typed [REDACTED_*] placeholder so the secret never leaves the host. Like PromptGuard, it is contributed from RustyClaw (MIT licensed).

What it detects

The detector recognizes a broad set of credential shapes:

Provider API keys — Stripe, OpenAI, Anthropic, Google, and GitHub tokens → [REDACTED_API_KEY]
AWS credentials — Access Key IDs and Secret Access Keys → [REDACTED_AWS_CREDENTIAL]
Generic secrets — password=, secret=, token= style patterns → [REDACTED_SECRET]
PEM private keys — RSA, EC, and OpenSSH key blocks → [REDACTED_PRIVATE_KEY]
JWTs → [REDACTED_JWT]
Database connection URLs — Postgres, MySQL, MongoDB, Redis → [REDACTED_DATABASE_URL]
High-entropy tokens — mixed alpha-digit strings ≥ 24 chars whose Shannon entropy clears the threshold → [REDACTED_HIGH_ENTROPY_TOKEN]

To avoid false positives, URL path segments and media markers such as [IMAGE:...] and [VIDEO:...] are explicitly excluded before entropy scanning, so ordinary filesystem paths and attachment references do not trip redaction.

LeakDetector configuration

LeakDetector is not currently configurable via config.toml. It always runs at a fixed sensitivity of 0.7 — the [security.leak_detector] block is not parsed and would be silently ignored. At the fixed 0.7 sensitivity, the generic password=/secret=/token= patterns require a value above 0.5 to match, and the entropy threshold is 3.5 + 0.7 × 1.25 ≈ 4.375. The structured patterns (provider keys, AWS, PEM, JWT, database URLs) are always active regardless of sensitivity.

Domain-based trust scoring

Trust scoring tracks how well the agent behaves per domain over time and automatically tightens autonomy where it underperforms. Each domain carries a floating-point score in [0.0, 1.0] (new domains start at initial_score, default 0.8). The score:

decays exponentially back toward the initial value over time (configurable half-life);
drops on correction events — user_override, quality_failure, and sop_deviation;
rises slightly on each success.

When a domain’s score falls below regression_threshold, Revka emits a RegressionAlert and downgrades autonomy by one tier for that domain: Full → Supervised, Supervised → ReadOnly, and ReadOnly stays ReadOnly. An agent that keeps making mistakes in a domain loses the right to act autonomously there until its score recovers.

`[trust]` configuration

[trust]
initial_score = 0.8
decay_half_life_days = 30
regression_threshold = 0.5
correction_penalty = 0.05
success_boost = 0.01

Key	Type	Default	Meaning
`initial_score`	f64	`0.8`	Starting trust score for a newly seen domain.
`decay_half_life_days`	f64	`30.0`	Half-life, in days, of the exponential decay back toward `initial_score`.
`regression_threshold`	f64	`0.5`	Score below which a `RegressionAlert` fires and autonomy is downgraded one tier.
`correction_penalty`	f64	`0.05`	Score deducted per correction event.
`success_boost`	f64	`0.01`	Score added per success event.

Scores are clamped to [0.0, 1.0]. A worked example: a domain starts at 0.80; three quality_failure corrections at 0.05 each take it to 0.65; a string of further corrections eventually crosses below 0.50, triggering a regression that drops a Full-autonomy domain to Supervised. Subsequent successes (+0.01 each) and time-based decay gradually pull the score back up.

Workspace isolation boundary (WorkspaceBoundary)

The workspace boundary enforces per-workspace isolation when you run one Revka instance for multiple clients. When a workspace profile is active, each access check returns either Allow or Deny with a reason:

Tools in the profile’s tool_restrictions list are denied.
Domains not in the profile’s allowed_domains list are denied.
Paths that belong to another workspace under the workspaces base directory are denied — unless cross_workspace_search = true, which permits read-like cross-workspace access. Paths outside the workspaces base directory are not restricted by the boundary (the autonomy policy’s path sandboxing still applies).

This check is additive with the SecurityPolicy: both must pass, so a workspace can only ever narrow what the global policy already allows.

Multi-client isolation with `[workspace]`

Enable isolation globally in [workspace], then define one profile per client. Each profile gets its own memory namespace, audit namespace, credential profile, domain allowlist, and tool restrictions.

[workspace]
enabled = true
active_workspace = "client_a"
workspaces_dir = "~/.revka/workspaces"
isolate_memory = true
isolate_secrets = true
isolate_audit = true
cross_workspace_search = false

Per-profile file at ~/.revka/workspaces/<name>/profile.toml:

name = "client_a"
allowed_domains = ["api.client-a.example.com"]
credential_profile = "client-a-creds"
memory_namespace = "client_a_mem"
audit_namespace = "client_a_audit"
tool_restrictions = ["shell"]

Key	Type	Default	Meaning
`enabled`	bool	`false`	Master switch for workspace isolation.
`active_workspace`	string	unset	Name of the currently active profile.
`workspaces_dir`	string	`~/.revka/workspaces`	Base directory holding the per-profile subdirectories.
`isolate_memory`	bool	`true`	Use a separate memory store per workspace.
`isolate_secrets`	bool	`true`	Use a separate secrets namespace per workspace.
`isolate_audit`	bool	`true`	Write a separate audit log per workspace.
`cross_workspace_search`	bool	`false`	When `true`, allows read-like access across workspace paths.

Per-profile fields:

Field	Meaning
`name`	Profile name. Must be alphanumeric plus `-`/`_`; empty names and `..` path traversal are rejected.
`allowed_domains`	Domain allowlist for this workspace; domains outside it are denied.
`tool_restrictions`	Tool names blocked in this workspace.
`memory_namespace`	Scopes this workspace’s memory access.
`audit_namespace`	Scopes this workspace’s audit-log entries.
`credential_profile`	Credential set this workspace uses. Redacted (`***`) on profile export.

Policy, commands & sandboxing Autonomy levels, the command allowlist, risk classification, and path sandboxing.

OTP gating & emergency stop TOTP gating for sensitive actions and the emergency-stop kill switch.

Audit log The tamper-evident Merkle audit log and per-workspace audit namespaces.

Security model How Revka's defence-in-depth layers fit together.

Autonomy levels & approvals The autonomy tiers that trust scoring downgrades.

Prompt injection, leak detection & trust

Prompt injection defense (PromptGuard)

Detection categories

PromptGuard configuration

Outbound credential leak detection (LeakDetector)

What it detects

LeakDetector configuration

Domain-based trust scoring

`[trust]` configuration

Workspace isolation boundary (WorkspaceBoundary)

Multi-client isolation with `[workspace]`

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Prompt injection, leak detection & trust

Prompt injection defense (PromptGuard)

Detection categories

PromptGuard configuration

Outbound credential leak detection (LeakDetector)

What it detects

LeakDetector configuration

Domain-based trust scoring

[trust] configuration

Workspace isolation boundary (WorkspaceBoundary)

Multi-client isolation with [workspace]

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

`[trust]` configuration

Multi-client isolation with `[workspace]`