Skip to main content

ADR-0055: FAQ-Corpus Drift Prevention — Purge Hand-Curated FAQs, Trust the Corpus

Date: 2026-05-12 | Status: Accepted (Phase 1 executed in commit a9820c3f) | Related: ADR-0056 Chat Answer-Shape Typology, ADR-0050 Twilio + LiveKit SIP

Context

The system has carried two independent answer-sources since voice was unified in Sprint E / Wave A:

  1. Hand-curated FAQ entries in backend/app/services/voice/tenant_overlays/_yaml/zol.yaml, matched by regex pre-filter in voice_faq_tool.match_faq and dispatched from rag_service.py:3773 for both chat and voice.
  2. Ingested corpus under app.documents + app.document_chunks, nightly crawled, embedded, and retrievable via RAG.

Despite the directory path services/voice/, the FAQ pre-filter is channel-agnostic: both chat and voice hit it. The two-source design was originally a latency optimization for common questions on voice.

The drift incident

On 2026-05-12 a user observed that voice answered "is er een apotheek?" correctly with "Apotheek Synaps Park…" while chat answered with the hand-curated FAQ text "ZOL heeft geen eigen apotheek voor publiek." The voice answer was correct only incidentally — the verbose STT transcript hit the compound-query skip rule at rag_service.py:3769 and fell through to RAG.

A formal audit of all 10 ZOL-specific FAQ entries against the live pilot corpus found:

VerdictCountEntries
Directly contradicted by corpus3faq_pharmacy, faq_lost_and_found, out_of_scope_billing
Incomplete / under-specified2faq_parking_general, faq_cafeteria
Unverified (no corpus evidence)3faq_visiting_hours_general, faq_opening_hours_helpdesk, faq_wifi
Aligned with corpus2faq_phone_number_general, faq_address_general

Drift typology: every entry making list claims, service routings, or counter-corpus assertions had drifted within ~3 months. The two aligned entries were single immutable facts.

Decision

Phase 1 — Purge drift-prone entries (executed 2026-05-12, commit a9820c3f). Remove all 10 ZOL-specific FAQ entries from zol.yaml. Preserve only:

  • crisis_suicide_ideation — safety/policy, Belgian crisis lines 1813 / 106.
  • emergency_chest_pain_and_breathing — safety/policy, 112 + ZOL spoed.
  • emergency_acute_neurological_deficit — safety/policy, stroke dispatch.
  • emergency_solo_keywords — safety/policy, generic emergency dispatch.
  • overnacht_ambiguity (under clarifications:) — disambiguation utility, not a factual claim.

_defaults.yaml is untouched: query-rewrite expansions (parkeren_broaden, apotheek_openingstijden, language variants) are search-enhancement rules that don't carry factual claims.

Phase 2 — Nightly drift detector (planned). backend/scripts/audit_faq_corpus.py: for each surviving entry, retrieve top-k chunks via existing rag_service retrieval, call GPT-4.1-mini once with FAQ canned answer + top-k chunks → return verdict {ALIGNED | CONTRADICTED | INCOMPLETE | SILENT} + 1-line rationale + cited chunk ids. Append result to faq-audit-<date>.md. Threshold: any CONTRADICTED → exit 1 → cron alert. Cost ~$0.015/run.

Phase 3 — Demand-driven promotion (planned, post-demo). Observe conversation_messages → cluster similar questions → score against RAG quality (rephrase rate, thumb-down, handoff) → low quality × high demand = candidate → auto-draft FAQ entry whose answer is the best recent successful RAG response on that cluster → admin approves. When RAG outperforms an FAQ entry on its cluster, flag the entry for deletion. Demand-driven FAQs are fresh by construction.

Surviving categories — when an FAQ entry is justified

Three explicit categories survive the cull:

CategoryJustificationExamples
Safety / policyCannot tolerate retrieval failure or LLM drift on safety-critical answerscrisis hotlines, 112 emergency dispatch
Out-of-corpus routingThe destination service is not represented in retrievable chunks(none currently)
Promoted from telemetryHigh demand + measured RAG quality gap(none yet; Phase 3 wiring required)

Anything outside these three categories goes through RAG.

Consequences

Positive

  • Single source of truth. Patient-facing answers come from the nightly-refreshed corpus, not from a hand-authored layer with month-scale drift.
  • Voice / chat consistency. Both channels now answer identically for topics previously FAQ-intercepted. The pharmacy incident cannot recur.
  • Reduced maintenance. 287 lines of YAML deleted; no on-call burden to keep hand-curated entries in sync with operational changes.
  • Better signal for content team. Once Phase 3 lands, the "what patients ask that RAG fails on" dashboard becomes a content-priority feed.

Negative / trade-offs

  • Latency. Topics that used to short-circuit (parking, cafetaria) now go through full RAG. Worst case: ~600-1 500 ms extra per query. Mitigation: streaming UX makes first-token latency feel responsive; voice channel has the 300-token cap.
  • Cache invalidation. app.semantic_query_cache may have entries carrying old FAQ answer text for affected topics. Operators should flush cache entries matching the deleted FAQ trigger patterns after deploy (or accept TTL-driven expiration).

Mitigations

  • Phase 2 nightly audit catches future drift on the surviving entries (the four safety rules) within 24 h of corpus changes.
  • ADR-0050's pilot deploy convention (build on pilot, SSL overlay, no voice_agent rebuild during chat changes) keeps the rollout surface tight.

Rejected alternatives

  1. Fix the contradicted FAQ entries manually + keep the rest. Rejected — drift was not random; every list / routing / counter-corpus entry drifted. Manual fixes pay maintenance cost forever without addressing the architectural cause.
  2. Move the FAQ registry into the corpus. Rejected — FAQ entries with real value (emergency dispatch, crisis hotlines) are precisely the ones that should NOT depend on retrieval-success. Safety entries need guaranteed delivery.
  3. Add staleness flags to each FAQ entry. Rejected — a "stale FAQ flag" is still a hand-curated layer that drifts.

References

  • Commit a9820c3f — Phase 1 deploy (2026-05-12)
  • rag_service.py:3724 — channel-agnostic FAQ dispatch docstring
  • rag_service.py:3769 — compound-query skip rule (the accidental freshness guard)
  • Master ADR: docs/ADR/0055-faq-corpus-drift-prevention.md