Context-aware filler (A.2)

Status

Designed and implemented 2026-04-25. First item of the Q2 naturalness sprint, follows Phase 4.3+4.4 (speculative-STT). Default ON in the container; flag VOICE_CONTEXT_FILLER_ENABLED=false to disable.

Why context-aware fillers exist

Tier-1 fillers fire ~300 ms after the caller stops speaking, while the backend RAG round-trip is still in flight. A generic filler ("Een ogenblikje") signals only "I heard you, working on it." A topic-aware filler ("Ik zoek de bezoekuren voor u op") additionally signals "I understood the topic" — moving caller perception from "is the system buying time?" to "the system parsed my question."

The architectural choice is to do this on the voice_agent side with regex pattern-matching, not in the backend with LLM intent classification. The trade-offs:

Decision	Chosen	Alternative	Rejected because
Where topic detection runs	voice_agent (LiveKit-side), regex	Backend, LLM-classified	Filler must fire ~300 ms after STT-final, and the backend roundtrip already accounts for that 300 ms — putting topic detection in the backend means the filler never wins the race. Regex on the agent side is < 1 ms.
Topic taxonomy	10 hand-curated topics × 4 languages	Open-vocabulary topic extraction	Hand-curated misses long-tail topics, but the fallback is the generic filler — strict subset of correct behaviour. Open-vocabulary topic extraction needs an LLM call (back to the latency problem above).
Granularity	Topic noun, not exact match	Echo verbatim phrase	Echoing verbatim ("Ik zoek 'wat zijn de bezoekuren bij cardiologie' voor u op") is creepy at scale; the topic-noun substitution preserves naturalness without uncanny-valley quotation.

{/* TODO Wave 2.D: bibkey for "Yngve 1970 backchannels in conversation" needed (foundational on listener acknowledgments timing) */} Sacks, Schegloff & Jefferson 1974

What it is

When a caller mentions a known hospital topic in their question, the agent's tier-1 filler swaps from generic to topic-aware:

Caller says	Old filler (generic)	New filler (context-aware)
"Wat zijn de bezoekuren bij cardiologie?"	"Een ogenblikje."	"Ik zoek de bezoekuren voor u op."
"How much is parking?"	"Let me check that for you."	"Let me look up the parking rates for you."
"Vorrei un appuntamento"	"Un momento, controllo per voi."	"Controllo le informazioni sull'appuntamento."

Caller perception shifts from "the system is buying time" to "the system already understood my question and is actively engaged."

How it works

The voice_agent already subscribes to LiveKit's interim-transcript events for Phase 4.3+4.4 (speculation) and distress-on-interim.
On every interim, the agent runs detect_filler_topic(text, language) — a per-language regex map covering 10 hospital topics. First match wins; subsequent partials don't override.
When the tier-1 filler fires (~300ms into RAG), it reads the stored _partial_topic and swaps in a topic-aware template.
Fallback chain ensures zero regression: no topic detected → generic filler (existing behavior); unknown language → generic; missing topic in language map → generic.

Topic taxonomy (10 topics × 4 languages)

The taxonomy lives in voice_agent/greeting.py:_FILLER_TOPIC_PATTERNS:

Topic key	Coverage
`visiting_hours`	Bezoekuren / visiting hours / heures de visite / orari di visita
`parking`	Parkeren / parking / tarifs de parking / parcheggio
`phone`	Telefoonnummer / phone number / numéro de téléphone / numero di telefono
`appointment`	Afspraak / appointment / rendez-vous / appuntamento
`opening_hours`	Openingsuren / opening hours / heures d'ouverture / orari di apertura
`emergency`	Spoed / emergency / urgences / pronto soccorso
`doctor`	Dokter / arts / doctor / médecin / medico
`department`	Afdeling / dienst / department / service / reparto
`admission`	Opname / admission / hospitalisation / ricovero
`address`	Adres / locatie / address / adresse / indirizzo

3 variants per topic per language (= 120 templates total) so the agent doesn't sound robotic when the same topic comes up across calls.

Failure modes

Scenario	Behavior
Topic detection misses (no keywords matched)	Generic filler — existing behavior
Caller speaks language not in the map (e.g., German)	Generic filler — `get_fillers` defaults to nl
Topic key found but missing in language's template map	Generic filler — defensive fallback
`VOICE_CONTEXT_FILLER_ENABLED=false`	Generic filler — operator override

No path leads to a crash, KeyError, or silent break. The feature is purely additive layered underneath existing behavior.

Observability

When a context-aware filler fires, the agent logs:

{
  "level": "INFO",
  "msg": "voice_context_filler_used",
  "topic": "visiting_hours",
  "language": "nl"
}

Greppable via docker logs zol-voice-agent | grep voice_context_filler_used. Aggregating these lines across a week of pilot calls tells operators which topics dominate (informs taxonomy expansion in next iteration).

Settings

# Default ON — flag off to revert to pre-A.2 generic-filler behavior
VOICE_CONTEXT_FILLER_ENABLED=true

Test it

Speak a multi-word question containing a known topic keyword. Listen for the topic noun in the agent's first-response filler. Examples:

nl: "Wat zijn de bezoekuren bij cardiologie?"
en: "How much does parking cost?"
fr: "Quel est le numéro de téléphone?"
it: "Vorrei un appuntamento"

Then say something not on the topic list ("Wat is uw naam?") and observe the generic filler returns. The fallback is the proof that A.2 doesn't regress existing behavior.

Latency budget

Operation	Local-dev p50
`detect_filler_topic()` regex match (10 topics × 4 languages)	< 1 ms
Filler template selection (random from 3 variants)	< 0.1 ms
ElevenLabs TTS first audio chunk for filler	150–250 ms* (Multilingual v2 streaming, @elevenlabs_multilingual_v2)

* Per pilot measurement pass (Phase 5 of readiness plan).

References

voice_agent/greeting.py — _FILLER_TOPIC_PATTERNS, detect_filler_topic, get_fillers
LiveKit Agents Documentation — interim transcript event hooks used to trigger detection
ElevenLabs Multilingual v2 — production TTS that speaks the filler templates

Why context-aware fillers exist​

What it is​

How it works​

Topic taxonomy (10 topics × 4 languages)​

Failure modes​

Observability​

Settings​

Test it​

Latency budget​

References​