Context-aware filler (A.2)
Designed and implemented 2026-04-25. First item of the Q2 naturalness
sprint, follows Phase 4.3+4.4 (speculative-STT). Default ON in the
container; flag VOICE_CONTEXT_FILLER_ENABLED=false to disable.
Why context-aware fillers exist
Tier-1 fillers fire ~300 ms after the caller stops speaking, while the backend RAG round-trip is still in flight. A generic filler ("Een ogenblikje") signals only "I heard you, working on it." A topic-aware filler ("Ik zoek de bezoekuren voor u op") additionally signals "I understood the topic" — moving caller perception from "is the system buying time?" to "the system parsed my question."
The architectural choice is to do this on the voice_agent side with regex pattern-matching, not in the backend with LLM intent classification. The trade-offs:
| Decision | Chosen | Alternative | Rejected because |
|---|---|---|---|
| Where topic detection runs | voice_agent (LiveKit-side), regex | Backend, LLM-classified | Filler must fire ~300 ms after STT-final, and the backend roundtrip already accounts for that 300 ms — putting topic detection in the backend means the filler never wins the race. Regex on the agent side is < 1 ms. |
| Topic taxonomy | 10 hand-curated topics × 4 languages | Open-vocabulary topic extraction | Hand-curated misses long-tail topics, but the fallback is the generic filler — strict subset of correct behaviour. Open-vocabulary topic extraction needs an LLM call (back to the latency problem above). |
| Granularity | Topic noun, not exact match | Echo verbatim phrase | Echoing verbatim ("Ik zoek 'wat zijn de bezoekuren bij cardiologie' voor u op") is creepy at scale; the topic-noun substitution preserves naturalness without uncanny-valley quotation. |
{/* TODO Wave 2.D: bibkey for "Yngve 1970 backchannels in conversation" needed (foundational on listener acknowledgments timing) */} Sacks, Schegloff & Jefferson 1974
What it is
When a caller mentions a known hospital topic in their question, the agent's tier-1 filler swaps from generic to topic-aware:
| Caller says | Old filler (generic) | New filler (context-aware) |
|---|---|---|
| "Wat zijn de bezoekuren bij cardiologie?" | "Een ogenblikje." | "Ik zoek de bezoekuren voor u op." |
| "How much is parking?" | "Let me check that for you." | "Let me look up the parking rates for you." |
| "Vorrei un appuntamento" | "Un momento, controllo per voi." | "Controllo le informazioni sull'appuntamento." |
Caller perception shifts from "the system is buying time" to "the system already understood my question and is actively engaged."
How it works
- The voice_agent already subscribes to LiveKit's interim-transcript events for Phase 4.3+4.4 (speculation) and distress-on-interim.
- On every interim, the agent runs
detect_filler_topic(text, language)— a per-language regex map covering 10 hospital topics. First match wins; subsequent partials don't override. - When the tier-1 filler fires (~300ms into RAG), it reads the stored
_partial_topicand swaps in a topic-aware template. - Fallback chain ensures zero regression: no topic detected → generic filler (existing behavior); unknown language → generic; missing topic in language map → generic.
Topic taxonomy (10 topics × 4 languages)
The taxonomy lives in voice_agent/greeting.py:_FILLER_TOPIC_PATTERNS:
| Topic key | Coverage |
|---|---|
visiting_hours | Bezoekuren / visiting hours / heures de visite / orari di visita |
parking | Parkeren / parking / tarifs de parking / parcheggio |
phone | Telefoonnummer / phone number / numéro de téléphone / numero di telefono |
appointment | Afspraak / appointment / rendez-vous / appuntamento |
opening_hours | Openingsuren / opening hours / heures d'ouverture / orari di apertura |
emergency | Spoed / emergency / urgences / pronto soccorso |
doctor | Dokter / arts / doctor / médecin / medico |
department | Afdeling / dienst / department / service / reparto |
admission | Opname / admission / hospitalisation / ricovero |
address | Adres / locatie / address / adresse / indirizzo |
3 variants per topic per language (= 120 templates total) so the agent doesn't sound robotic when the same topic comes up across calls.
Failure modes
| Scenario | Behavior |
|---|---|
| Topic detection misses (no keywords matched) | Generic filler — existing behavior |
| Caller speaks language not in the map (e.g., German) | Generic filler — get_fillers defaults to nl |
| Topic key found but missing in language's template map | Generic filler — defensive fallback |
VOICE_CONTEXT_FILLER_ENABLED=false | Generic filler — operator override |
No path leads to a crash, KeyError, or silent break. The feature is purely additive layered underneath existing behavior.
Observability
When a context-aware filler fires, the agent logs:
{
"level": "INFO",
"msg": "voice_context_filler_used",
"topic": "visiting_hours",
"language": "nl"
}
Greppable via docker logs zol-voice-agent | grep voice_context_filler_used.
Aggregating these lines across a week of pilot calls tells operators
which topics dominate (informs taxonomy expansion in next iteration).
Settings
# Default ON — flag off to revert to pre-A.2 generic-filler behavior
VOICE_CONTEXT_FILLER_ENABLED=true
Test it
Speak a multi-word question containing a known topic keyword. Listen for the topic noun in the agent's first-response filler. Examples:
- nl: "Wat zijn de bezoekuren bij cardiologie?"
- en: "How much does parking cost?"
- fr: "Quel est le numéro de téléphone?"
- it: "Vorrei un appuntamento"
Then say something not on the topic list ("Wat is uw naam?") and observe the generic filler returns. The fallback is the proof that A.2 doesn't regress existing behavior.
Latency budget
| Operation | Local-dev p50 |
|---|---|
detect_filler_topic() regex match (10 topics × 4 languages) | < 1 ms |
| Filler template selection (random from 3 variants) | < 0.1 ms |
| ElevenLabs TTS first audio chunk for filler | 150–250 ms* (Multilingual v2 streaming, @elevenlabs_multilingual_v2) |
* Per pilot measurement pass (Phase 5 of readiness plan).
References
voice_agent/greeting.py—_FILLER_TOPIC_PATTERNS,detect_filler_topic,get_fillers- LiveKit Agents Documentation — interim transcript event hooks used to trigger detection
- ElevenLabs Multilingual v2 — production TTS that speaks the filler templates