Skip to main content

Release Notes: April 11 – May 4, 2026

Voice Pipeline Simplification, Tenant Overlay System & Twilio Phase A

~280 commits | 8 sessions | Voice path reduced from 8 stages to 3 | New multi-tenant FAQ overlay architecture | Belgian PSTN telephony infrastructure live

This release covers nearly four weeks of voice-channel work. The headline themes:

  1. The legacy 8-stage voice pipeline was deleted entirely (~7,000 LOC) and replaced with a thin three-stage orchestrator (regex pre-filter → FAQ → RAG). The complex pipeline turned out to be worthless under live testing; a much smaller orchestrator wins on every measurable axis.
  2. Six iterative transcript-driven fix batches (A through E + cardiology hotfix) addressed real issues found in voice smoke tests: STT phonetic mishears, repeat-offer loops, dialect farewells, decompose-induced latency, "list all campuses" UX gap.
  3. A new multi-tenant overlay system for tenant-scoped FAQ + STT, with DB-driven answer renderers so tenant data (campus addresses, doctor names) is never hardcoded in YAML or source.
  4. Twilio Phase A — self-hosted LiveKit SIP gateway and inbound trunk now operational on the local stack, ready for pilot DNS + TLS.

ADR-0049: Thin Voice Pipeline (Architectural Cut)

The legacy VoiceOrchestrator ran 8 stages per turn (preprocessor LLM → safety gate → speculative-STT cache → dialogue manager → tool dispatcher → state ladder → orchestrator integration → answer shaper). Live smoke tests revealed it was both slow (~6-12s per turn) and brittle — every layer added a new failure mode without proportional benefit.

The new ThinVoiceOrchestrator runs three stages:

  1. Regex pre-filter (voice_thin_pre_filter.py) — classifies greetings, farewells, explicit handoff requests, and medical-advice safety refusals. ~1ms.
  2. FAQ regex short-circuit (voice_faq_tool.match_faq) — matches high-traffic generic queries and returns a curated answer. ~1ms.
  3. RAG fallthrough — single LLM call for everything else. The full retrieval + answer pipeline.

What was deleted (commit 158d793):

  • voice_orchestrator.py (1,399 LOC)
  • voice_dialogue_manager.py (built and unused)
  • voice_dialogue_dispatcher.py
  • voice_dialogue_schemas.py
  • voice_dialogue_state.py
  • voice_preprocessor.py
  • voice_safety_gate.py
  • voice_speculation_cache.py
  • conversational_intent_resolver.py
  • conversational_intent_rules.py
  • stt_ambiguity_guardrail.py
  • 17 legacy test files

Total: ~7,000 lines removed in one commit.

The thin pipeline is now the production behavior on every channel. The feature flag (VOICE_THIN_PIPELINE_ENABLED) and legacy selector are gone.

Voice Transcript-Driven Fix Batches A → E

The user did six rounds of live voice smoke tests, each producing a transcript that surfaced specific bugs. Each batch landed as one cohesive commit.

Batch A (7533264 — fix compound subtopic, title casing, closer variation):

  • Compound subtopic detection — parking + chargers + accessibility no longer collapses into a single FAQ-only answer.
  • Doctor title normalization — Dr. / Prof. / Mevr. / Dhr. consistent capitalization in voice output.
  • Closer-question variation directive — Heeft u nog een vraag? no longer repeats every turn.

Batch B (1cf7643 — STT phonetic fallback + warmer out-of-hours greeting):

  • STT normalization dictionary for high-frequency phonetic mishears (decoren → doctoren, pseoriasis → psoriasis, parkeertharieven → parkeertarieven).
  • Out-of-hours greeting reworded: emphasizes the AI agent helps now, not "the helpdesk is closed."

Decompose-skip optimization (9c45737):

  • For vector_only single-topic queries, the ~2s decompose LLM call + ~3s multi-hop rerank is skipped. Latency win of ~5s on simple navigation queries. Compound queries (T9 protection — parkeertarieven en laadpalen) still decompose via the conjunction-regex gate.

Cardiology hotfix (ce0647b):

A live regression: voice agent fired the handoff confirmation prompt for ordinary cardiology doctor lookups. Root cause: _qs_filter_by_conversation_department rebuilt citations using dict-style .get() on Pydantic Citation objects → AttributeError → backend sent generic error event → voice_agent's rag_bridge synthesized conversational_intent="escalate" → handoff confirm fired. Fix: match on (document_id, chunk_index) (both sides have these fields) using a set-based lookup.

Batch C (e3e6d05 — affirmation loop, dialect farewell, decompose skip extension):

  • Affirmation loop fix: Ja zeker after a yes/no offer no longer re-runs the previous query. New classifier rule: bare affirmations after an agent yes/no question must commit to the offer's content, not echo the previous turn.
  • Dialect farewell regex: haak op / trap het af / hang op (Flemish/Limburgs) now classified as FAREWELL.
  • Decompose-skip extended to single-topic hybrid: the Wat zijn de bezoekuren in cardiologie? lookup that fired three filler phrases now answers in one — single-dept queries skip decompose too.

Batch D (ee46a1f — generic-only fixes from 2026-05-04 transcript):

  • dienst → afdeling deterministic rewrite in voice_answer_shaper. Catches LLM mirroring of corpus tokens; compound nouns (Ombudsdienst, Spoeddienst) word-boundary-protected.
  • Topic-elliptic carry rule in classifier prompt. En de rest? / Nog meer? / Die andere? now infer the topic from the agent's previous answer instead of asking for clarification.
  • House numbers + postal codes stay as digits across all four voice languages (nl/en/fr/it). Bessemerstraat 478 reads correctly; Bessemerstraat vierhonderdachtenzeventig is forbidden.
  • Closer-variation HARD RULE added to en/fr/it (was nl-only after batch C). Voice now varies closing questions across all four languages.

Batch E: Tenant Overlay System (20ec058)

Batch D's first draft introduced tenant-specific data (campus addresses, doctor names) into shared YAML and STT dicts. The user pushed back: the project is multi-tenant SaaS, and that data already lives in the database. Tenant data must NEVER be duplicated outside its canonical home.

The fix is an architectural cut:

shared (generic, language-level) ← stays in source code
+ tenant overlay (STT mishears) ← tenant_overlays/_yaml/<slug>.yaml
+ DB-driven renderers (campus listing) ← reads via get_taxonomy(slug)
─────────────────────────────────────
effective registry at request time

New package app/services/voice/tenant_overlays/:

  • schema.py — Pydantic models for tenant YAML.
  • loader.py — YAML parse → schema validate → regex compile-on-load. Failures crash at boot, not at request time.
  • registry.py — module-level eager-loaded registry + reload_overlays() for dev hot-reload.
  • _yaml/zol.yaml — only tenant-specific STT mishears (zon → zol, geerte → geert, jurissen → jeurissen). NO campus addresses, NO doctor names.

New module app/services/voice/voice_faq_renderers.py:

  • RENDERERS registry + @register_renderer("name") decorator.
  • render_campus_listing(language, tenant_slug) reads get_taxonomy(slug).hospital_campuses and formats per language. House numbers + postal codes stay as digits.

FAQ entry signature extended:

@dataclass(frozen=True)
class FAQEntry:
key: str
patterns: dict[str, list[re.Pattern[str]]]
answers: dict[str, str] = field(default_factory=dict)
answer_renderer: str | None = None # NEW: dispatch to RENDERERS

The address_all_campuses entry uses answer_renderer="list_campuses". Patterns are generic (alle\s+campussen, all\s+addresses, etc.); the answer composition reads from the tenant's DB-cached hospital_campuses. Add a campus to the DB → next request reflects it. Zero sync gap, zero hardcoded ZOL data.

List-all top-K bump (tenant-agnostic):

  • _is_list_all_query detects alle X / all X / tous les X / tutti X patterns.
  • When matched on default-top-K request, bumps max_results 5 → 15 so retrieval long-tail (4th campus, less-frequented department) survives.
  • Operators who explicitly set higher max_results are honored.

Twilio Phase A: Self-Hosted LiveKit SIP (docs/ADR/0050)

The voice channel previously required the LiveKit playground UI for testing. Phase A makes it callable over standard SIP — a softphone like Linphone can register against localhost:5060 and reach voice_agent end-to-end.

Decision: self-host the entire telephony stack. Twilio's role narrows to:

  • Owning the PSTN number (+32460256021)
  • Forwarding inbound calls via Elastic SIP Trunk to our public SIP URI
  • Handling 112 emergency-call routing per Belgian regulator

Rate limit: 10 calls/hour per caller-ID, enforced via Redis counter (same pattern as the public WS rate limit).

What's now operational locally:

  • livekit-sip container on UDP 5060 / TCP 5061 / RTP 10000-10100 (docker/livekit-sip.yaml)
  • LiveKit server bumped v1.8 → v1.9 (CLI compatibility — lk v2.16.2 couldn't write dispatch rules to v1.8 server)
  • Redis section added to docker/livekit.yaml so the SIP service can announce itself to the LiveKit server
  • Inbound SIP trunk registered: phase-a-dev-trunk with auth user/pass
  • Dispatch rule registered: each call routes to a fresh sip-call_<caller>_<random> room; voice_agent (JT_ROOM worker) auto-joins

Phase B (pilot deployment with public DNS + Let's Encrypt SIP TLS + firewall rules) and Phase C (Twilio trunk → pilot SIP URI) are next.

Test Debt Sprint Close-Out (April 22)

Sprint outcome from PRs #41 – #66:

  • ruff extend-exclude reduced 23 → 0 entries.
  • Coverage floor bumped 40% → 55% (measured 57.73%).
  • ~600 previously-skipped tests restored.
  • Four production bugs documented with file:line refs (evaluation_service:474, alembic/env.py:24-28, AllowlistFilter fallback, background-task event-loop race).

Nightly Auto-Ingest on Pilot

Pilot crawl + ingest now runs every night at 03:00 UTC (image 7c7b50c2800899). The first run surfaced two bugs:

  • Empty-content classification gap — pages with zero text were stuck in transient retry. Added DEAD_EMPTY_CONTENT failure class.
  • MissingGreenlet clobberingrecord.id and record.url access inside an except block triggered SQLAlchemy errors that hid the real exception. Fixed by capturing both fields BEFORE the try block.

Corpus grew from 5,815 → 5,841 documents on the first nightly run. INGEST_MODE=auto remains live.

Quality Metrics

CheckStatus
ruff check0 errors on changed files
pyright0 errors / 0 warnings on changed files
Unit tests (voice + overlay + renderers)258 passed
Voice integration testsAll passing
Safety incidents0
LOC removed (legacy voice pipeline)~7,000
LOC added (thin pipeline + overlay system)~3,500
Net LOC delta-3,500 with more functionality

Documentation

  • ADR-0049 — Thin voice pipeline (locked, implemented, in production)
  • ADR-0050 — Twilio + self-hosted LiveKit SIP integration (Accepted)
  • New doc pagedocs/voice/tenant-overlay-system.md (this release)
  • Updateddocs/voice/dialogue-manager.md marked as historical
  • Plans archive — voice-dialogue-manager-design (locked), voice-dialogue-manager-implementation (built then removed), thin-voice-orchestrator (canonical)

Architecture Notes

Why the legacy pipeline failed:

The 8-stage design front-loaded too much heuristic. Each stage added a confidence threshold, a fallback path, and an integration point — and each was a place where the pipeline could lose context. Live smoke tests showed the dialogue manager spec'd 6 tools the LLM almost never picked correctly; the safety gate was redundant with the regex pre-filter; the speculative-STT cache hit rate was below 5%. Throwing it out and going back to "regex first, FAQ second, RAG third" turned out to be a UX win.

Why the tenant overlay was a cleanup:

Batch D's first draft duplicated DB-truth data (campus addresses, doctor surnames) into shared YAML. The user objected: this creates a sync gap that goes silently stale forever. The fix established three categories with clear homes:

TypeWhereWhy
Generic patterns + language rulesSource codeHospital-agnostic; benefits any tenant
Tenant-specific phonetic-recovery (STT mishears)YAML overlayPhonetic data that doesn't exist anywhere else
Tenant data (addresses, names, hours)DB + rendererSingle source of truth; no duplication

Adding a hospital #2 now means: a new acme.yaml with their STT mishears + PUBLIC_TENANT_SLUG=acme + populate the campuses table. Zero shared-code changes.


~280 commits | 8 sessions | Author: SOFT4U BV + Claude Opus 4.7