Release Notes: April 11 – May 4, 2026
Voice Pipeline Simplification, Tenant Overlay System & Twilio Phase A
~280 commits | 8 sessions | Voice path reduced from 8 stages to 3 | New multi-tenant FAQ overlay architecture | Belgian PSTN telephony infrastructure live
This release covers nearly four weeks of voice-channel work. The headline themes:
- The legacy 8-stage voice pipeline was deleted entirely (~7,000 LOC) and replaced with a thin three-stage orchestrator (regex pre-filter → FAQ → RAG). The complex pipeline turned out to be worthless under live testing; a much smaller orchestrator wins on every measurable axis.
- Six iterative transcript-driven fix batches (A through E + cardiology hotfix) addressed real issues found in voice smoke tests: STT phonetic mishears, repeat-offer loops, dialect farewells, decompose-induced latency, "list all campuses" UX gap.
- A new multi-tenant overlay system for tenant-scoped FAQ + STT, with DB-driven answer renderers so tenant data (campus addresses, doctor names) is never hardcoded in YAML or source.
- Twilio Phase A — self-hosted LiveKit SIP gateway and inbound trunk now operational on the local stack, ready for pilot DNS + TLS.
ADR-0049: Thin Voice Pipeline (Architectural Cut)
The legacy VoiceOrchestrator ran 8 stages per turn (preprocessor LLM → safety gate → speculative-STT cache → dialogue manager → tool dispatcher → state ladder → orchestrator integration → answer shaper). Live smoke tests revealed it was both slow (~6-12s per turn) and brittle — every layer added a new failure mode without proportional benefit.
The new ThinVoiceOrchestrator runs three stages:
- Regex pre-filter (
voice_thin_pre_filter.py) — classifies greetings, farewells, explicit handoff requests, and medical-advice safety refusals. ~1ms. - FAQ regex short-circuit (
voice_faq_tool.match_faq) — matches high-traffic generic queries and returns a curated answer. ~1ms. - RAG fallthrough — single LLM call for everything else. The full retrieval + answer pipeline.
What was deleted (commit 158d793):
voice_orchestrator.py(1,399 LOC)voice_dialogue_manager.py(built and unused)voice_dialogue_dispatcher.pyvoice_dialogue_schemas.pyvoice_dialogue_state.pyvoice_preprocessor.pyvoice_safety_gate.pyvoice_speculation_cache.pyconversational_intent_resolver.pyconversational_intent_rules.pystt_ambiguity_guardrail.py- 17 legacy test files
Total: ~7,000 lines removed in one commit.
The thin pipeline is now the production behavior on every channel. The feature flag (VOICE_THIN_PIPELINE_ENABLED) and legacy selector are gone.
Voice Transcript-Driven Fix Batches A → E
The user did six rounds of live voice smoke tests, each producing a transcript that surfaced specific bugs. Each batch landed as one cohesive commit.
Batch A (7533264 — fix compound subtopic, title casing, closer variation):
- Compound subtopic detection —
parking + chargers + accessibilityno longer collapses into a single FAQ-only answer. - Doctor title normalization —
Dr./Prof./Mevr./Dhr.consistent capitalization in voice output. - Closer-question variation directive —
Heeft u nog een vraag?no longer repeats every turn.
Batch B (1cf7643 — STT phonetic fallback + warmer out-of-hours greeting):
- STT normalization dictionary for high-frequency phonetic mishears (
decoren → doctoren,pseoriasis → psoriasis,parkeertharieven → parkeertarieven). - Out-of-hours greeting reworded: emphasizes the AI agent helps now, not "the helpdesk is closed."
Decompose-skip optimization (9c45737):
- For
vector_onlysingle-topic queries, the ~2s decompose LLM call + ~3s multi-hop rerank is skipped. Latency win of ~5s on simple navigation queries. Compound queries (T9 protection —parkeertarieven en laadpalen) still decompose via the conjunction-regex gate.
Cardiology hotfix (ce0647b):
A live regression: voice agent fired the handoff confirmation prompt for ordinary cardiology doctor lookups. Root cause: _qs_filter_by_conversation_department rebuilt citations using dict-style .get() on Pydantic Citation objects → AttributeError → backend sent generic error event → voice_agent's rag_bridge synthesized conversational_intent="escalate" → handoff confirm fired. Fix: match on (document_id, chunk_index) (both sides have these fields) using a set-based lookup.
Batch C (e3e6d05 — affirmation loop, dialect farewell, decompose skip extension):
- Affirmation loop fix:
Ja zekerafter a yes/no offer no longer re-runs the previous query. New classifier rule: bare affirmations after an agent yes/no question must commit to the offer's content, not echo the previous turn. - Dialect farewell regex:
haak op/trap het af/hang op(Flemish/Limburgs) now classified as FAREWELL. - Decompose-skip extended to single-topic hybrid: the
Wat zijn de bezoekuren in cardiologie?lookup that fired three filler phrases now answers in one — single-dept queries skip decompose too.
Batch D (ee46a1f — generic-only fixes from 2026-05-04 transcript):
dienst → afdelingdeterministic rewrite invoice_answer_shaper. Catches LLM mirroring of corpus tokens; compound nouns (Ombudsdienst,Spoeddienst) word-boundary-protected.- Topic-elliptic carry rule in classifier prompt.
En de rest?/Nog meer?/Die andere?now infer the topic from the agent's previous answer instead of asking for clarification. - House numbers + postal codes stay as digits across all four voice languages (nl/en/fr/it).
Bessemerstraat 478reads correctly;Bessemerstraat vierhonderdachtenzeventigis forbidden. - Closer-variation HARD RULE added to en/fr/it (was nl-only after batch C). Voice now varies closing questions across all four languages.
Batch E: Tenant Overlay System (20ec058)
Batch D's first draft introduced tenant-specific data (campus addresses, doctor names) into shared YAML and STT dicts. The user pushed back: the project is multi-tenant SaaS, and that data already lives in the database. Tenant data must NEVER be duplicated outside its canonical home.
The fix is an architectural cut:
shared (generic, language-level) ← stays in source code
+ tenant overlay (STT mishears) ← tenant_overlays/_yaml/<slug>.yaml
+ DB-driven renderers (campus listing) ← reads via get_taxonomy(slug)
─────────────────────────────────────
effective registry at request time
New package app/services/voice/tenant_overlays/:
schema.py— Pydantic models for tenant YAML.loader.py— YAML parse → schema validate → regex compile-on-load. Failures crash at boot, not at request time.registry.py— module-level eager-loaded registry +reload_overlays()for dev hot-reload._yaml/zol.yaml— only tenant-specific STT mishears (zon → zol,geerte → geert,jurissen → jeurissen). NO campus addresses, NO doctor names.
New module app/services/voice/voice_faq_renderers.py:
RENDERERSregistry +@register_renderer("name")decorator.render_campus_listing(language, tenant_slug)readsget_taxonomy(slug).hospital_campusesand formats per language. House numbers + postal codes stay as digits.
FAQ entry signature extended:
@dataclass(frozen=True)
class FAQEntry:
key: str
patterns: dict[str, list[re.Pattern[str]]]
answers: dict[str, str] = field(default_factory=dict)
answer_renderer: str | None = None # NEW: dispatch to RENDERERS
The address_all_campuses entry uses answer_renderer="list_campuses". Patterns are generic (alle\s+campussen, all\s+addresses, etc.); the answer composition reads from the tenant's DB-cached hospital_campuses. Add a campus to the DB → next request reflects it. Zero sync gap, zero hardcoded ZOL data.
List-all top-K bump (tenant-agnostic):
_is_list_all_querydetectsalle X/all X/tous les X/tutti Xpatterns.- When matched on default-top-K request, bumps
max_results5 → 15 so retrieval long-tail (4th campus, less-frequented department) survives. - Operators who explicitly set higher
max_resultsare honored.
Twilio Phase A: Self-Hosted LiveKit SIP (docs/ADR/0050)
The voice channel previously required the LiveKit playground UI for testing. Phase A makes it callable over standard SIP — a softphone like Linphone can register against localhost:5060 and reach voice_agent end-to-end.
Decision: self-host the entire telephony stack. Twilio's role narrows to:
- Owning the PSTN number (+32460256021)
- Forwarding inbound calls via Elastic SIP Trunk to our public SIP URI
- Handling 112 emergency-call routing per Belgian regulator
Rate limit: 10 calls/hour per caller-ID, enforced via Redis counter (same pattern as the public WS rate limit).
What's now operational locally:
livekit-sipcontainer on UDP 5060 / TCP 5061 / RTP 10000-10100 (docker/livekit-sip.yaml)- LiveKit server bumped v1.8 → v1.9 (CLI compatibility —
lk v2.16.2couldn't write dispatch rules to v1.8 server) - Redis section added to
docker/livekit.yamlso the SIP service can announce itself to the LiveKit server - Inbound SIP trunk registered:
phase-a-dev-trunkwith auth user/pass - Dispatch rule registered: each call routes to a fresh
sip-call_<caller>_<random>room; voice_agent (JT_ROOM worker) auto-joins
Phase B (pilot deployment with public DNS + Let's Encrypt SIP TLS + firewall rules) and Phase C (Twilio trunk → pilot SIP URI) are next.
Test Debt Sprint Close-Out (April 22)
Sprint outcome from PRs #41 – #66:
ruff extend-excludereduced 23 → 0 entries.- Coverage floor bumped 40% → 55% (measured 57.73%).
- ~600 previously-skipped tests restored.
- Four production bugs documented with file:line refs (
evaluation_service:474,alembic/env.py:24-28,AllowlistFilterfallback, background-task event-loop race).
Nightly Auto-Ingest on Pilot
Pilot crawl + ingest now runs every night at 03:00 UTC (image 7c7b50c → 2800899). The first run surfaced two bugs:
- Empty-content classification gap — pages with zero text were stuck in transient retry. Added
DEAD_EMPTY_CONTENTfailure class. MissingGreenletclobbering —record.idandrecord.urlaccess inside anexceptblock triggered SQLAlchemy errors that hid the real exception. Fixed by capturing both fields BEFORE thetryblock.
Corpus grew from 5,815 → 5,841 documents on the first nightly run. INGEST_MODE=auto remains live.
Quality Metrics
| Check | Status |
|---|---|
ruff check | 0 errors on changed files |
pyright | 0 errors / 0 warnings on changed files |
| Unit tests (voice + overlay + renderers) | 258 passed |
| Voice integration tests | All passing |
| Safety incidents | 0 |
| LOC removed (legacy voice pipeline) | ~7,000 |
| LOC added (thin pipeline + overlay system) | ~3,500 |
| Net LOC delta | -3,500 with more functionality |
Documentation
- ADR-0049 — Thin voice pipeline (locked, implemented, in production)
- ADR-0050 — Twilio + self-hosted LiveKit SIP integration (Accepted)
- New doc page —
docs/voice/tenant-overlay-system.md(this release) - Updated —
docs/voice/dialogue-manager.mdmarked as historical - Plans archive — voice-dialogue-manager-design (locked), voice-dialogue-manager-implementation (built then removed), thin-voice-orchestrator (canonical)
Architecture Notes
Why the legacy pipeline failed:
The 8-stage design front-loaded too much heuristic. Each stage added a confidence threshold, a fallback path, and an integration point — and each was a place where the pipeline could lose context. Live smoke tests showed the dialogue manager spec'd 6 tools the LLM almost never picked correctly; the safety gate was redundant with the regex pre-filter; the speculative-STT cache hit rate was below 5%. Throwing it out and going back to "regex first, FAQ second, RAG third" turned out to be a UX win.
Why the tenant overlay was a cleanup:
Batch D's first draft duplicated DB-truth data (campus addresses, doctor surnames) into shared YAML. The user objected: this creates a sync gap that goes silently stale forever. The fix established three categories with clear homes:
| Type | Where | Why |
|---|---|---|
| Generic patterns + language rules | Source code | Hospital-agnostic; benefits any tenant |
| Tenant-specific phonetic-recovery (STT mishears) | YAML overlay | Phonetic data that doesn't exist anywhere else |
| Tenant data (addresses, names, hours) | DB + renderer | Single source of truth; no duplication |
Adding a hospital #2 now means: a new acme.yaml with their STT mishears + PUBLIC_TENANT_SLUG=acme + populate the campuses table. Zero shared-code changes.
~280 commits | 8 sessions | Author: SOFT4U BV + Claude Opus 4.7