Skip to content

Response cache, hardware RAG & isolation

The two-tier response cache, the local hardware datasheet RAG index, namespace isolation, and GDPR export.

Beyond the Kumiho graph, Revka keeps a few local, per-workspace memory facilities that have nothing to do with the cloud control plane: an opt-in LLM response cache that avoids re-billing for identical prompts, a hardware-datasheet RAG index for board pin lookups, and the policy/isolation machinery that keeps one agent’s — or one client’s — memory from bleeding into another’s. This page covers all four, plus the Memory::export path used for GDPR data portability.

Read Memory overview first for the [memory] section as a whole and why durable memory lives in Kumiho. Everything here lives under [memory] (and [memory.policy]) in ~/.revka/config.toml, except where noted.

The response cache is a two-tier cache that deduplicates identical LLM prompt-plus-model combinations so you aren’t billed twice for the same query. It is off by default — opt in.

  • Hot tier — an in-memory LRU of the most recently used entries.
  • Warm tier — a WAL-mode SQLite database at <workspace_dir>/memory/response_cache.db.

Each entry is keyed by the SHA-256 of (model || system_prompt || user_prompt). On a miss, Revka looks in SQLite and promotes the hit into the in-memory tier; on a hit, it increments the entry’s hit_count and updates accessed_at.

[memory]
response_cache_enabled = true # default: false (opt-in)
response_cache_ttl_minutes = 60 # entry expiry
response_cache_max_entries = 5000 # SQLite warm-tier cap before LRU eviction
response_cache_hot_entries = 256 # in-memory hot-tier size
KeyTypeDefaultMeaning
response_cache_enabledboolfalseMaster switch. Must be true for any caching.
response_cache_ttl_minutesint60Time-to-live per entry, in minutes.
response_cache_max_entriesint5000Warm-tier (SQLite) cap. Beyond this, LRU eviction applies.
response_cache_hot_entriesint256In-memory hot-tier size.

The cache tracks statistics internally — total entries, total hits, and tokens saved — though those figures are not yet surfaced through the CLI.

Because the database path is under <workspace_dir>, the cache is per-workspace: with workspace isolation enabled, each client gets its own response_cache.db and cannot read another’s cached responses.

HardwareRag is a local retrieval index over your hardware datasheets. When the agent answers a hardware question (a pin lookup, a board capability), Revka retrieves matching datasheet content and pin aliases and injects them into the agent’s context. It is entirely local — no datasheet text leaves the machine.

It loads .md and .txt files (and .pdf with the rag-pdf feature) from a configured directory, scores them by keyword overlap, and boosts chunks tagged for the board you’re asking about.

Point a peripheral at a datasheet directory relative to your workspace:

[[peripherals]]
datasheet_dir = "datasheets" # relative to the workspace dir

Lay out one file per board. The filename (minus extension) becomes the board tag:

workspace/
datasheets/
nucleo-f401re.md # board tag = "nucleo-f401re"
rpi-gpio.txt # board tag = "rpi-gpio"
generic.md # no board tag — matches all queries

A query scoped to one or more boards (boards: &[String]) gives matching chunks a +2.0 score boost. Files named generic* or placed in a _generic/ subdirectory carry no board tag and match every query, so use them for cross-board context.

If datasheet_dir is absent or the directory doesn’t exist, RAG returns empty results silently — it never errors.

A datasheet can declare named pin aliases so the agent can resolve red_led to a pin number without guessing. Pin-alias context is built separately from chunk retrieval, so aliases are always available for matching boards. Two formats are accepted.

As key-value lines under a Pin Aliases heading:

## Pin Aliases
red_led: 13
builtin_led: 13

Or as a Markdown table:

## Pin Aliases
| alias | pin |
|----------|-----|
| red_led | 13 |

PDF ingestion is gated behind the rag-pdf Cargo feature, which pulls in the pdf_extract crate:

Terminal window
cargo build --features hardware,rag-pdf

Without rag-pdf, .pdf datasheets are skipped by the index, and the datasheet tool’s read action returns the file path for manual reference instead of extracted text. The companion config flag [hardware].workspace_datasheets = true indexes workspace PDFs for these RAG-based pin lookups.

If you run a small local model, set compact_context in [agent]. Among other context-shrinking effects, it caps the RAG chunk limit at 2 so datasheet retrieval doesn’t crowd out a tight context window:

[agent]
compact_context = true # recommended for models ≤13B

See Custom providers & local LLMs for the broader small-model tuning picture, and Aardvark I2C/SPI/GPIO & datasheets for the datasheet download workflow that feeds this index.

Every memory entry carries a namespace field that isolates entries between agents and contexts. Entries that don’t specify one fall into default_namespace. The memory policy can then cap how many entries a namespace or category may hold, mark namespaces read-only, and override retention per category.

[memory]
default_namespace = "default"
[memory.policy]
max_entries_per_namespace = 1000
max_entries_per_category = 0 # 0 = unlimited
read_only_namespaces = ["system_facts"]
retention_days_by_category = { core = 365, daily = 30, conversation = 7 }
KeyTypeDefaultMeaning
default_namespacestring"default"Namespace assigned to entries that don’t specify one.
max_entries_per_namespaceint0Cap per namespace. 0 = unlimited.
max_entries_per_categoryint0Cap per category. 0 = unlimited.
read_only_namespacesarray[]Writes to these namespaces are rejected.
retention_days_by_categorytableunsetPer-category retention override (keyed by category name).

When you run one Revka instance for multiple clients, workspace isolation gives each client engagement a separate memory database under <workspaces_dir>/<client>/memory/. A request scoped to one workspace cannot reach another’s entries — this is the hard boundary that namespaces alone don’t provide.

[workspace]
enabled = true
isolate_memory = true # default: true when workspaces are enabled
cross_workspace_search = false # security default — no cross-tenant reads
KeyTypeDefaultMeaning
isolate_memorybooltrueSeparate memory database per workspace.
cross_workspace_searchboolfalseAllow reads across workspaces. Leave false to prevent memory bleed between clients.

Because isolation works at the database-file level, it also separates the response cache: each workspace gets its own memory/response_cache.db. See Config: gateway, memory, security & platform for the full [workspace] schema and per-profile settings.

The Memory trait’s export method supports a bulk, filtered export of memory entries for GDPR Article 20 data portability. It returns entries ordered by creation time (ascending), with embeddings excluded, filtered by namespace, session, category, and time range.

The filter shape (ExportFilter):

FieldTypeMeaning
namespaceOption<String>Restrict to one namespace.
session_idOption<String>Restrict to one session.
categoryOption<MemoryCategory>Restrict to core, daily, conversation, or a custom label.
sinceOption<String>RFC 3339 lower bound (inclusive on timestamp).
untilOption<String>RFC 3339 upper bound (inclusive on timestamp).
let filter = ExportFilter {
namespace: Some("default".to_string()),
session_id: None,
category: Some(MemoryCategory::Core),
since: Some("2026-01-01T00:00:00Z".to_string()),
until: Some("2026-12-31T23:59:59Z".to_string()),
};
let entries = memory.export(&filter).await?;

The default trait implementation delegates to list() plus client-side filtering; backends with native query support override it for efficiency.