Skip to main content

Context-aware filler (A.2)

Status

Designed and implemented 2026-04-25. First item of the Q2 naturalness sprint, follows Phase 4.3+4.4 (speculative-STT). Default ON in the container; flag VOICE_CONTEXT_FILLER_ENABLED=false to disable.

Why context-aware fillers exist

Tier-1 fillers fire ~300 ms after the caller stops speaking, while the backend RAG round-trip is still in flight. A generic filler ("Een ogenblikje") signals only "I heard you, working on it." A topic-aware filler ("Ik zoek de bezoekuren voor u op") additionally signals "I understood the topic" — moving caller perception from "is the system buying time?" to "the system parsed my question."

The architectural choice is to do this on the voice_agent side with regex pattern-matching, not in the backend with LLM intent classification. The trade-offs:

DecisionChosenAlternativeRejected because
Where topic detection runsvoice_agent (LiveKit-side), regexBackend, LLM-classifiedFiller must fire ~300 ms after STT-final, and the backend roundtrip already accounts for that 300 ms — putting topic detection in the backend means the filler never wins the race. Regex on the agent side is < 1 ms.
Topic taxonomy10 hand-curated topics × 4 languagesOpen-vocabulary topic extractionHand-curated misses long-tail topics, but the fallback is the generic filler — strict subset of correct behaviour. Open-vocabulary topic extraction needs an LLM call (back to the latency problem above).
GranularityTopic noun, not exact matchEcho verbatim phraseEchoing verbatim ("Ik zoek 'wat zijn de bezoekuren bij cardiologie' voor u op") is creepy at scale; the topic-noun substitution preserves naturalness without uncanny-valley quotation.

{/* TODO Wave 2.D: bibkey for "Yngve 1970 backchannels in conversation" needed (foundational on listener acknowledgments timing) */} Sacks, Schegloff & Jefferson 1974

What it is

When a caller mentions a known hospital topic in their question, the agent's tier-1 filler swaps from generic to topic-aware:

Caller saysOld filler (generic)New filler (context-aware)
"Wat zijn de bezoekuren bij cardiologie?""Een ogenblikje.""Ik zoek de bezoekuren voor u op."
"How much is parking?""Let me check that for you.""Let me look up the parking rates for you."
"Vorrei un appuntamento""Un momento, controllo per voi.""Controllo le informazioni sull'appuntamento."

Caller perception shifts from "the system is buying time" to "the system already understood my question and is actively engaged."

How it works

  1. The voice_agent already subscribes to LiveKit's interim-transcript events for Phase 4.3+4.4 (speculation) and distress-on-interim.
  2. On every interim, the agent runs detect_filler_topic(text, language) — a per-language regex map covering 10 hospital topics. First match wins; subsequent partials don't override.
  3. When the tier-1 filler fires (~300ms into RAG), it reads the stored _partial_topic and swaps in a topic-aware template.
  4. Fallback chain ensures zero regression: no topic detected → generic filler (existing behavior); unknown language → generic; missing topic in language map → generic.

Topic taxonomy (10 topics × 4 languages)

The taxonomy lives in voice_agent/greeting.py:_FILLER_TOPIC_PATTERNS:

Topic keyCoverage
visiting_hoursBezoekuren / visiting hours / heures de visite / orari di visita
parkingParkeren / parking / tarifs de parking / parcheggio
phoneTelefoonnummer / phone number / numéro de téléphone / numero di telefono
appointmentAfspraak / appointment / rendez-vous / appuntamento
opening_hoursOpeningsuren / opening hours / heures d'ouverture / orari di apertura
emergencySpoed / emergency / urgences / pronto soccorso
doctorDokter / arts / doctor / médecin / medico
departmentAfdeling / dienst / department / service / reparto
admissionOpname / admission / hospitalisation / ricovero
addressAdres / locatie / address / adresse / indirizzo

3 variants per topic per language (= 120 templates total) so the agent doesn't sound robotic when the same topic comes up across calls.

Failure modes

ScenarioBehavior
Topic detection misses (no keywords matched)Generic filler — existing behavior
Caller speaks language not in the map (e.g., German)Generic filler — get_fillers defaults to nl
Topic key found but missing in language's template mapGeneric filler — defensive fallback
VOICE_CONTEXT_FILLER_ENABLED=falseGeneric filler — operator override

No path leads to a crash, KeyError, or silent break. The feature is purely additive layered underneath existing behavior.

Observability

When a context-aware filler fires, the agent logs:

{
"level": "INFO",
"msg": "voice_context_filler_used",
"topic": "visiting_hours",
"language": "nl"
}

Greppable via docker logs zol-voice-agent | grep voice_context_filler_used. Aggregating these lines across a week of pilot calls tells operators which topics dominate (informs taxonomy expansion in next iteration).

Settings

# Default ON — flag off to revert to pre-A.2 generic-filler behavior
VOICE_CONTEXT_FILLER_ENABLED=true

Test it

Speak a multi-word question containing a known topic keyword. Listen for the topic noun in the agent's first-response filler. Examples:

  • nl: "Wat zijn de bezoekuren bij cardiologie?"
  • en: "How much does parking cost?"
  • fr: "Quel est le numéro de téléphone?"
  • it: "Vorrei un appuntamento"

Then say something not on the topic list ("Wat is uw naam?") and observe the generic filler returns. The fallback is the proof that A.2 doesn't regress existing behavior.

Latency budget

OperationLocal-dev p50
detect_filler_topic() regex match (10 topics × 4 languages)< 1 ms
Filler template selection (random from 3 variants)< 0.1 ms
ElevenLabs TTS first audio chunk for filler150–250 ms* (Multilingual v2 streaming, @elevenlabs_multilingual_v2)

* Per pilot measurement pass (Phase 5 of readiness plan).

References