ADR-0055: FAQ-Corpus Drift Prevention — Purge Hand-Curated FAQs, Trust the Corpus
Date: 2026-05-12 | Status: Accepted (Phase 1 executed in commit a9820c3f) | Related: ADR-0056 Chat Answer-Shape Typology, ADR-0050 Twilio + LiveKit SIP
Context
The system has carried two independent answer-sources since voice was unified in Sprint E / Wave A:
- Hand-curated FAQ entries in
backend/app/services/voice/tenant_overlays/_yaml/zol.yaml, matched by regex pre-filter invoice_faq_tool.match_faqand dispatched fromrag_service.py:3773for both chat and voice. - Ingested corpus under
app.documents+app.document_chunks, nightly crawled, embedded, and retrievable via RAG.
Despite the directory path services/voice/, the FAQ pre-filter is channel-agnostic: both chat and voice hit it. The two-source design was originally a latency optimization for common questions on voice.
The drift incident
On 2026-05-12 a user observed that voice answered "is er een apotheek?" correctly with "Apotheek Synaps Park…" while chat answered with the hand-curated FAQ text "ZOL heeft geen eigen apotheek voor publiek." The voice answer was correct only incidentally — the verbose STT transcript hit the compound-query skip rule at rag_service.py:3769 and fell through to RAG.
A formal audit of all 10 ZOL-specific FAQ entries against the live pilot corpus found:
| Verdict | Count | Entries |
|---|---|---|
| Directly contradicted by corpus | 3 | faq_pharmacy, faq_lost_and_found, out_of_scope_billing |
| Incomplete / under-specified | 2 | faq_parking_general, faq_cafeteria |
| Unverified (no corpus evidence) | 3 | faq_visiting_hours_general, faq_opening_hours_helpdesk, faq_wifi |
| Aligned with corpus | 2 | faq_phone_number_general, faq_address_general |
Drift typology: every entry making list claims, service routings, or counter-corpus assertions had drifted within ~3 months. The two aligned entries were single immutable facts.
Decision
Phase 1 — Purge drift-prone entries (executed 2026-05-12, commit a9820c3f). Remove all 10 ZOL-specific FAQ entries from zol.yaml. Preserve only:
crisis_suicide_ideation— safety/policy, Belgian crisis lines 1813 / 106.emergency_chest_pain_and_breathing— safety/policy, 112 + ZOL spoed.emergency_acute_neurological_deficit— safety/policy, stroke dispatch.emergency_solo_keywords— safety/policy, generic emergency dispatch.overnacht_ambiguity(underclarifications:) — disambiguation utility, not a factual claim.
_defaults.yaml is untouched: query-rewrite expansions (parkeren_broaden, apotheek_openingstijden, language variants) are search-enhancement rules that don't carry factual claims.
Phase 2 — Nightly drift detector (planned). backend/scripts/audit_faq_corpus.py: for each surviving entry, retrieve top-k chunks via existing rag_service retrieval, call GPT-4.1-mini once with FAQ canned answer + top-k chunks → return verdict {ALIGNED | CONTRADICTED | INCOMPLETE | SILENT} + 1-line rationale + cited chunk ids. Append result to faq-audit-<date>.md. Threshold: any CONTRADICTED → exit 1 → cron alert. Cost ~$0.015/run.
Phase 3 — Demand-driven promotion (planned, post-demo). Observe conversation_messages → cluster similar questions → score against RAG quality (rephrase rate, thumb-down, handoff) → low quality × high demand = candidate → auto-draft FAQ entry whose answer is the best recent successful RAG response on that cluster → admin approves. When RAG outperforms an FAQ entry on its cluster, flag the entry for deletion. Demand-driven FAQs are fresh by construction.
Surviving categories — when an FAQ entry is justified
Three explicit categories survive the cull:
| Category | Justification | Examples |
|---|---|---|
| Safety / policy | Cannot tolerate retrieval failure or LLM drift on safety-critical answers | crisis hotlines, 112 emergency dispatch |
| Out-of-corpus routing | The destination service is not represented in retrievable chunks | (none currently) |
| Promoted from telemetry | High demand + measured RAG quality gap | (none yet; Phase 3 wiring required) |
Anything outside these three categories goes through RAG.
Consequences
Positive
- Single source of truth. Patient-facing answers come from the nightly-refreshed corpus, not from a hand-authored layer with month-scale drift.
- Voice / chat consistency. Both channels now answer identically for topics previously FAQ-intercepted. The pharmacy incident cannot recur.
- Reduced maintenance. 287 lines of YAML deleted; no on-call burden to keep hand-curated entries in sync with operational changes.
- Better signal for content team. Once Phase 3 lands, the "what patients ask that RAG fails on" dashboard becomes a content-priority feed.
Negative / trade-offs
- Latency. Topics that used to short-circuit (parking, cafetaria) now go through full RAG. Worst case: ~600-1 500 ms extra per query. Mitigation: streaming UX makes first-token latency feel responsive; voice channel has the 300-token cap.
- Cache invalidation.
app.semantic_query_cachemay have entries carrying old FAQ answer text for affected topics. Operators should flush cache entries matching the deleted FAQ trigger patterns after deploy (or accept TTL-driven expiration).
Mitigations
- Phase 2 nightly audit catches future drift on the surviving entries (the four safety rules) within 24 h of corpus changes.
- ADR-0050's pilot deploy convention (build on pilot, SSL overlay, no voice_agent rebuild during chat changes) keeps the rollout surface tight.
Rejected alternatives
- Fix the contradicted FAQ entries manually + keep the rest. Rejected — drift was not random; every list / routing / counter-corpus entry drifted. Manual fixes pay maintenance cost forever without addressing the architectural cause.
- Move the FAQ registry into the corpus. Rejected — FAQ entries with real value (emergency dispatch, crisis hotlines) are precisely the ones that should NOT depend on retrieval-success. Safety entries need guaranteed delivery.
- Add staleness flags to each FAQ entry. Rejected — a "stale FAQ flag" is still a hand-curated layer that drifts.
References
- Commit
a9820c3f— Phase 1 deploy (2026-05-12) rag_service.py:3724— channel-agnostic FAQ dispatch docstringrag_service.py:3769— compound-query skip rule (the accidental freshness guard)- Master ADR:
docs/ADR/0055-faq-corpus-drift-prevention.md