Response cache, hardware RAG & isolation

The two-tier response cache, the local hardware datasheet RAG index, namespace isolation, and GDPR export.

Beyond the Kumiho graph, Revka keeps a few local, per-workspace memory facilities that have nothing to do with the cloud control plane: an opt-in LLM response cache that avoids re-billing for identical prompts, a hardware-datasheet RAG index for board pin lookups, and the policy/isolation machinery that keeps one agent’s — or one client’s — memory from bleeding into another’s. This page covers all four, plus the Memory::export path used for GDPR data portability.

Read Memory overview first for the [memory] section as a whole and why durable memory lives in Kumiho. Everything here lives under [memory] (and [memory.policy]) in ~/.revka/config.toml, except where noted.

LLM response cache (`ResponseCache`)

The response cache is a two-tier cache that deduplicates identical LLM prompt-plus-model combinations so you aren’t billed twice for the same query. It is off by default — opt in.

Hot tier — an in-memory LRU of the most recently used entries.
Warm tier — a WAL-mode SQLite database at <workspace_dir>/memory/response_cache.db.

Each entry is keyed by the SHA-256 of (model || system_prompt || user_prompt). On a miss, Revka looks in SQLite and promotes the hit into the in-memory tier; on a hit, it increments the entry’s hit_count and updates accessed_at.

Enable and tune it

[memory]
response_cache_enabled = true        # default: false (opt-in)
response_cache_ttl_minutes = 60      # entry expiry
response_cache_max_entries = 5000    # SQLite warm-tier cap before LRU eviction
response_cache_hot_entries = 256     # in-memory hot-tier size

Key	Type	Default	Meaning
`response_cache_enabled`	bool	`false`	Master switch. Must be `true` for any caching.
`response_cache_ttl_minutes`	int	`60`	Time-to-live per entry, in minutes.
`response_cache_max_entries`	int	`5000`	Warm-tier (SQLite) cap. Beyond this, LRU eviction applies.
`response_cache_hot_entries`	int	`256`	In-memory hot-tier size.

The cache tracks statistics internally — total entries, total hits, and tokens saved — though those figures are not yet surfaced through the CLI.

Because the database path is under <workspace_dir>, the cache is per-workspace: with workspace isolation enabled, each client gets its own response_cache.db and cannot read another’s cached responses.

Hardware RAG (`HardwareRag`)

HardwareRag is a local retrieval index over your hardware datasheets. When the agent answers a hardware question (a pin lookup, a board capability), Revka retrieves matching datasheet content and pin aliases and injects them into the agent’s context. It is entirely local — no datasheet text leaves the machine.

It loads .md and .txt files (and .pdf with the rag-pdf feature) from a configured directory, scores them by keyword overlap, and boosts chunks tagged for the board you’re asking about.

Set up the datasheet directory

Point a peripheral at a datasheet directory relative to your workspace:

[[peripherals]]
datasheet_dir = "datasheets"   # relative to the workspace dir

Lay out one file per board. The filename (minus extension) becomes the board tag:

workspace/
  datasheets/
    nucleo-f401re.md    # board tag = "nucleo-f401re"
    rpi-gpio.txt        # board tag = "rpi-gpio"
    generic.md          # no board tag — matches all queries

A query scoped to one or more boards (boards: &[String]) gives matching chunks a +2.0 score boost. Files named generic* or placed in a _generic/ subdirectory carry no board tag and match every query, so use them for cross-board context.

If datasheet_dir is absent or the directory doesn’t exist, RAG returns empty results silently — it never errors.

Pin-alias tables

A datasheet can declare named pin aliases so the agent can resolve red_led to a pin number without guessing. Pin-alias context is built separately from chunk retrieval, so aliases are always available for matching boards. Two formats are accepted.

As key-value lines under a Pin Aliases heading:

## Pin Aliases
red_led: 13
builtin_led: 13

Or as a Markdown table:

## Pin Aliases
| alias    | pin |
|----------|-----|
| red_led  | 13  |

The `rag-pdf` feature

PDF ingestion is gated behind the rag-pdf Cargo feature, which pulls in the pdf_extract crate:

cargo build --features hardware,rag-pdf

Without rag-pdf, .pdf datasheets are skipped by the index, and the datasheet tool’s read action returns the file path for manual reference instead of extracted text. The companion config flag [hardware].workspace_datasheets = true indexes workspace PDFs for these RAG-based pin lookups.

`compact_context` for small models

If you run a small local model, set compact_context in [agent]. Among other context-shrinking effects, it caps the RAG chunk limit at 2 so datasheet retrieval doesn’t crowd out a tight context window:

[agent]
compact_context = true   # recommended for models ≤13B

See Custom providers & local LLMs for the broader small-model tuning picture, and Aardvark I2C/SPI/GPIO & datasheets for the datasheet download workflow that feeds this index.

Namespace isolation

Every memory entry carries a namespace field that isolates entries between agents and contexts. Entries that don’t specify one fall into default_namespace. The memory policy can then cap how many entries a namespace or category may hold, mark namespaces read-only, and override retention per category.

[memory]
default_namespace = "default"

[memory.policy]
max_entries_per_namespace = 1000
max_entries_per_category = 0                 # 0 = unlimited
read_only_namespaces = ["system_facts"]
retention_days_by_category = { core = 365, daily = 30, conversation = 7 }

Key	Type	Default	Meaning
`default_namespace`	string	`"default"`	Namespace assigned to entries that don’t specify one.
`max_entries_per_namespace`	int	`0`	Cap per namespace. `0` = unlimited.
`max_entries_per_category`	int	`0`	Cap per category. `0` = unlimited.
`read_only_namespaces`	array	`[]`	Writes to these namespaces are rejected.
`retention_days_by_category`	table	unset	Per-category retention override (keyed by category name).

Multi-client workspace memory isolation

When you run one Revka instance for multiple clients, workspace isolation gives each client engagement a separate memory database under <workspaces_dir>/<client>/memory/. A request scoped to one workspace cannot reach another’s entries — this is the hard boundary that namespaces alone don’t provide.

[workspace]
enabled = true
isolate_memory = true          # default: true when workspaces are enabled
cross_workspace_search = false # security default — no cross-tenant reads

Key	Type	Default	Meaning
`isolate_memory`	bool	`true`	Separate memory database per workspace.
`cross_workspace_search`	bool	`false`	Allow reads across workspaces. Leave `false` to prevent memory bleed between clients.

Because isolation works at the database-file level, it also separates the response cache: each workspace gets its own memory/response_cache.db. See Config: gateway, memory, security & platform for the full [workspace] schema and per-profile settings.

The Memory trait’s export method supports a bulk, filtered export of memory entries for GDPR Article 20 data portability. It returns entries ordered by creation time (ascending), with embeddings excluded, filtered by namespace, session, category, and time range.

The filter shape (ExportFilter):

Field	Type	Meaning
`namespace`	`Option<String>`	Restrict to one namespace.
`session_id`	`Option<String>`	Restrict to one session.
`category`	`Option<MemoryCategory>`	Restrict to `core`, `daily`, `conversation`, or a custom label.
`since`	`Option<String>`	RFC 3339 lower bound (inclusive on `timestamp`).
`until`	`Option<String>`	RFC 3339 upper bound (inclusive on `timestamp`).

let filter = ExportFilter {
    namespace: Some("default".to_string()),
    session_id: None,
    category: Some(MemoryCategory::Core),
    since: Some("2026-01-01T00:00:00Z".to_string()),
    until: Some("2026-12-31T23:59:59Z".to_string()),
};
let entries = memory.export(&filter).await?;

The default trait implementation delegates to list() plus client-side filtering; backends with native query support override it for efficiency.

Memory overview — the full [memory] section, NoneMemory binding, categories, and decay.
Kumiho setup — install the sidecar and choose cloud vs Community Edition.
Graph model: spaces, items & provenance — the Kumiho data model behind durable memory.
Aardvark I2C/SPI/GPIO & datasheets — the datasheet tool that feeds the hardware RAG index.
Config: gateway, memory, security & platform — full [memory], [memory.policy], and [workspace] schema.
revka memory & estop — inspect and manage memory from the CLI.

Response cache, hardware RAG & isolation

LLM response cache (`ResponseCache`)

Enable and tune it

Hardware RAG (`HardwareRag`)

Set up the datasheet directory

Pin-alias tables

The `rag-pdf` feature

`compact_context` for small models

Namespace isolation

Multi-client workspace memory isolation

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

Response cache, hardware RAG & isolation

LLM response cache (ResponseCache)

Enable and tune it

Hardware RAG (HardwareRag)

Set up the datasheet directory

Pin-alias tables

The rag-pdf feature

compact_context for small models

Namespace isolation

Multi-client workspace memory isolation

GDPR data portability export

Related pages

Get started

Core concepts

Guides

CLI reference

Gateway API

Dashboard

Channels

Providers & models

Tools

Memory

Workflows & SOP

Cron & scheduling

Security & audit

Deployment & ops

Hardware

MCP & extensibility

Ecosystem

Reference

LLM response cache (`ResponseCache`)

Hardware RAG (`HardwareRag`)

The `rag-pdf` feature

`compact_context` for small models