ADR-0054: Intent Classification Cache

Date: 2026-05-12 | Status: Accepted | Relates to: ADR-0031 Semantic Query Cache, ADR-0030 (LLM Entity Extraction)

Context

Per-stage telemetry showed intent classification dominating chat-pipeline latency at p50 ≈ 2,300 ms × every query — a single OpenAI gpt-4.1 call that runs on every turn before retrieval. The Semantic Query Cache already eliminates the full pipeline cost for exact-string answer repeats, but on a Tier-1 miss every step (including intent classification) is rerun fresh.

Production traffic analysis (2026-05-11) showed a meaningful fraction of queries are repeat phrasings of common questions. The intent classifier returns deterministic output at temperature=0.1: identical input produces identical output. There is no reason to pay 2,300 ms for the same answer twice.

Decision

A separate cache layer keyed on (tenant_id, normalized_query, language) storing the full IntentClassificationResult Pydantic model. The cache is deliberately decoupled from the semantic query cache because their failure modes differ:

Cache	What it stores	Wrong cache hit looks like
Semantic query cache	Full LLM answer	Wrong content in the response
Intent classification cache	Routing decision (intent class + strategy)	Wrong route — retrieval gets bad top-K, prompt uses wrong template

Two backends share one async interface (IntentCacheBackend Protocol):

Memory backend (default)

Per-worker OrderedDict bounded LRU with TTL. Process restart clears the cache — a safety property that bounds poisoning to one container lifecycle. Used on the single-worker pilot.

Redis backend (opt-in via `INTENT_CACHE_BACKEND=redis`)

Shared across worker replicas via the existing app.db.redis connection pool. Values are JSON round-tripped through Pydantic. Keys are prefixed intent_cache: so a SCAN-based clear targets only this cache without disturbing rate-limiter or token-blacklist keys.

The Redis backend persists across container restarts — meaning the "restart fixes poisoning" remedy that works for the memory backend no longer applies. The compensating control bundled in the same PR (f9a335c4) is the operator "Clear Cache" button on PlatformSettingsPage, which wipes both the intent cache AND the semantic query cache in a single click (POST /api/v1/settings/cache/clear).

Poisoning Guard

IntentClassificationService writes to the cache only when:

result.confidence >= INTENT_CACHE_CONFIDENCE_THRESHOLD (default 0.85)
result.intent != UserIntent.UNKNOWN

These guards live in the caller, so they apply regardless of backend choice.

Resilience

Every Redis operation is wrapped in try/except and falls back to a cache miss on failure. A Redis outage degrades the system to "every query pays the 2,300 ms LLM cost" — not "the system crashes." If the Redis connection pool is uninitialised when get_intent_cache() is first called with INTENT_CACHE_BACKEND=redis, the singleton falls back to the memory backend with a startup-time warning.

Configuration

Env variable	Default	Range	Purpose
`INTENT_CACHE_ENABLED`	`true`	bool	Master switch
`INTENT_CACHE_BACKEND`	`memory`	`memory` / `redis`	Pick backend
`INTENT_CACHE_MAX_SIZE`	`1000`	10–100000	Memory backend LRU bound
`INTENT_CACHE_TTL_SECONDS`	`3600`	60–604800	Both backends
`INTENT_CACHE_CONFIDENCE_THRESHOLD`	`0.85`	0.0–1.0	Poisoning guard

Consequences

Positive

Cache hit removes ~2,300 ms from the per-turn latency budget. Stacks with the semantic_query_cache — if both hit, the full pipeline collapses to ~50 ms.
Backend selection is a runtime knob, not a code change. Single-worker pilot uses memory; multi-worker production flips to Redis without rebuilding the image.
Poisoning has a one-click remedy via the existing UI button — operators don't need shell access to recover from a cache-poisoning incident.

Negative

Two caches now share the same poisoning failure mode. In Redis mode, container restart no longer self-heals poisoning. This is the entire reason the UI kill switch was bundled in the same PR as the Redis backend, rather than deferred to a follow-up.
Marginal Redis pool pressure under heavy traffic — one additional GET per request that isn't already cached at the semantic layer. Pool defaults are sized for rate-limiting + token-blacklist + ingestion; redis_max_connections may need adjustment under future load profiles.
Cross-worker observability is per-process today. stats() returns per-worker counters; aggregate hit rate requires aggregation across workers. Future work would emit metrics to the existing pipeline_telemetry stream.

Alternatives Considered

Alternative	Rejected because
Extend the semantic_query_cache to also cache intent results	Conflates two different failure modes and value types. The semantic cache key is a 1536-dim embedding of the reformulated query; the intent cache key is the raw user input. The semantic cache stores full answers; the intent cache stores classification objects. Reusing the table would have forced shared schema, eviction policy, and TTL semantics.
Cache intent results in PostgreSQL	Adds a database round-trip on every classification — opposite of the goal. Redis hits in 1–2 ms locally; PG hits in 5–10 ms.
Embed the cache inside `IntentClassificationService`	Would couple the cache to the service and prevent the clean Protocol-based backend swap. The current factory + Protocol design keeps the service agnostic to backend choice.

Verification

Live verification on pilot (zol-rag-app:f9a335c4, 2026-05-12):

# 1. Trigger a fresh intent classification
curl -X POST .../api/v1/query -d '{"query":"...","channel":"web"}'

# 2. Confirm Redis key written
redis-cli --scan --pattern "intent_cache:*"
# Returns: intent_cache:|nl|<normalized query>

Test coverage:

12 unit tests on MemoryIntentCache (backend/tests/unit/services/test_intent_cache.py)
13 integration tests on RedisIntentCache against a Redis 7 testcontainer (backend/tests/integration/test_redis_intent_cache.py)
4 integration tests on the kill-switch endpoint (backend/tests/integration/api/test_settings_cache_clear.py)

ADR-0031: Semantic Query Cache — the other cache layer; both share the same UI kill switch
backend/app/services/intent_cache.py — implementation
backend/app/api/settings.py — POST /api/v1/settings/cache/clear endpoint
System Overview — where this cache fits in the layered architecture

Context​

Decision​

Memory backend (default)​

Redis backend (opt-in via INTENT_CACHE_BACKEND=redis)​

Poisoning Guard​

Resilience​

Configuration​

Consequences​

Positive​

Negative​

Alternatives Considered​

Verification​

Related​