Conversational Intent
The voice channel emits one of five conversational_intent values on every turn — answered, farewell, escalate, out_of_scope, or (for backwards compatibility only) switch_language. This routing label is what the voice_agent reads to decide whether to keep the session open, hand off via SIP REFER, or hang up. This page documents how that label is derived in the current architecture (post-158d793).
The two mechanisms
There is no longer a dedicated "conversational intent resolver" service. Conversational intent is derived from two mechanisms inside VoiceLLMOrchestrator:
Mechanism 1 — classify_terminal regex pre-filter
Module: backend/app/services/voice/voice_thin_pre_filter.py, function classify_terminal().
Every caller utterance is run through a deterministic regex classifier before the LLM is invoked. The classifier returns one of seven TerminalClass values, in this precedence order (highest first):
| Class | Maps to conversational_intent | Triggered by |
|---|---|---|
SAFETY_REFUSAL | out_of_scope | Medical-advice phrasings (dosage, prescription, "what should I take") |
HANDOFF_REQUEST | escalate | Explicit human-transfer requests ("verbind me door", "speak to an agent") |
REPEAT_REQUEST | answered (with prior-answer re-emit) | Caller asks for repetition ("kun je dat herhalen") |
OFF_TOPIC_PERSONAL | answered (with redirect) | Personal-life or off-topic queries unrelated to the hospital |
FAREWELL | farewell | Caller says goodbye ("tot ziens", "doei", "bedankt") |
GREETING | answered (with greeting echo) | Caller opens with hello but no question yet |
FALLTHROUGH | (LLM stage decides) | Substantive query — handed to GPT-4.1 |
Precedence ordering matters. A caller who says "Bedankt voor de informatie, kunt u me toch nog even doorverbinden?" must be classified as HANDOFF_REQUEST (escalate), not FAREWELL (which would hang up before the transfer). The cascade is structured so safety-critical intents (refusal, handoff, repeat) are evaluated before social ones (farewell, greeting).
Mechanism 2 — GPT-4.1 tool-choice
Module: backend/app/services/voice/voice_llm_orchestrator.py, class VoiceLLMOrchestrator.
When classify_terminal returns FALLTHROUGH, the orchestrator invokes GPT-4.1 with a three-tool schema. The agent's tool choice maps directly to the conversational intent:
| Tool the LLM calls | conversational_intent returned |
|---|---|
end_call (only after explicit goodbye marker) | farewell |
transfer_to_helpdesk | escalate |
search_hospital_kb (one or more times) → final answer | answered |
| No tool (LLM answers directly) | answered |
Two safeguards keep the LLM honest about when to terminate:
end_callrequires explicit goodbye. The system prompt instructs "End the call ONLY after the caller has spoken an explicit goodbye marker — for example 'tot ziens', 'dag', 'doei'." Seevoice_llm_orchestrator.py:149–154. This prevents the LLM from hanging up on a"Bedankt voor de informatie"mid-conversation.- Two consecutive empty searches force escalation. When
search_hospital_kbreturnsfound=Falsetwice in a row, the orchestrator force-transfers to the helpdesk withconversational_intent="escalate"(voice_llm_orchestrator.py:519–556). This defends against the gibberish-rephrase loop pattern from the 2026-05-07 traffic.
Agent-side gate — emphasis-only affirmations no longer carry context
There is a small third classifier that the diagram above doesn't show because it sits on the agent side, not in VoiceLLMOrchestrator: _is_short_clarification() in voice_agent/agent.py:454. Its job is to decide whether a short caller utterance should re-use the previous turn's question as context (the FAQ-followup carry) or be sent to the backend as a fresh utterance.
Commit eb5ebf66 (2026-05-26) added a deny-list — _EMPHASIS_ONLY_NL_EN in voice_agent/agent.py:446 — for bare emphasis affirmations in Dutch and English: "ja", "ja graag", "graag", "ja alsjeblieft", "alsjeblieft", "yes", "yes please", "please", "yes thank you". Even though these look like "short clarification" turns, they carry no new content to combine with the previous question. Sending them to the backend with the previous question prepended (the old behaviour) produced stitched compound queries that confused the LLM. The fix lets each such utterance fall through to the LLM as-is, so the LLM can ask a real follow-up question or move on. Pilot trace 92e11ea3 turn 12 is the regression case.
This is purely an agent-side concern — it does not change which conversational_intent value the orchestrator returns. It's documented here because it changes what reaches the two mechanisms above.
Trade-offs of the current design
The current two-mechanism model replaced a three-tier hybrid resolver (rules → confident-classifier short-circuit → gpt-4.1-nano LLM fallback) that was deleted in commit 158d793. The trade-offs of the current design vs the alternatives:
| Decision | Chosen | Alternative | Rejected because |
|---|---|---|---|
| Where intent is decided | Regex pre-filter for terminal classes; GPT-4.1 itself for FALLTHROUGH | Dedicated gpt-4.1-nano LLM resolver before the main GPT-4.1 call | Two LLM calls per turn was the dominant cost in latency profiling; the main GPT-4.1 already has the conversational context to decide tool dispatch correctly without a second model. |
| Regex vs LLM for safety-refusal | Regex pre-filter | LLM-only refusal | A recognised dosage ask must NEVER reach the LLM (safety invariant). Regex pre-filter is the only mechanism that can guarantee zero LLM exposure on the safety-critical class. See Voice Safety Architecture. |
| Single intent label | One label per turn | Multi-label (medical-intent + conversational-intent in the response) | Medical intent is computed inside RAGService (intent classifier service) and used internally for ranking and prompt selection; the voice-agent only needs one routing label. Surfacing two labels added cognitive load with no observable benefit. |
Compare with the deleted three-tier resolver: the latency model was "~40% rule match (< 1 ms), ~50% confident-classifier short-circuit (0 ms), ~10% LLM fallback (~150–300 ms)". In the current architecture the dominant latency factor is the GPT-4.1 first-token (300–600 ms) — but that call is already on the critical path for any FALLTHROUGH query. There is no additional resolver call.
Backwards-compatibility note: switch_language
The conversational_intent value switch_language is preserved in the response schema for backwards compatibility (existing voice_agent versions still parse it), but no current orchestrator path emits it. ADR-0052 removed mid-call language switching: the language is locked at first utterance, and a request to switch mid-call is handled inside the LLM stage as a polite decline plus optional helpdesk transfer (not as a separate intent label).
The dangling enum value is documented for frontends still listing it, but new code should never compare against it.
Operator signal
Each turn logs the resolved intent at INFO so operators can spot misclassifications from logs alone:
[VoiceLLMOrchestrator] turn_complete intent=answered tool_calls=1 latency_ms=842
HANDOFF_REQUEST should match for handoff-seeking utterances; SAFETY_REFUSAL should match for medical-advice phrasings. A regression on the regex pre-filter surfaces as a spike in out_of_scope or escalate false-negatives on the voice golden seed run.
Contract test
backend/tests/integration/services/voice/test_voice_llm_orchestrator_integration.py pins the contract that voice_agent's detected_language parameter is honoured by the orchestrator, and that the orchestrator never emits conversational_intent="switch_language" on the current path:
async def test_detected_language_from_voice_agent_is_respected_by_orchestrator(
orchestrator, make_request
):
request = make_request(query="Parkeren bij ZOL", detected_language="nl")
response = await orchestrator.query_stream(request).__anext__()
assert response.detected_language == "nl"
assert response.conversational_intent != "switch_language"
This is the R3 cross-component contract test per CLAUDE.md §Silent-Failure Discipline — a test that simulates the wire format and asserts both sides agree.
References
backend/app/services/voice/voice_thin_pre_filter.py—classify_terminal,TerminalClassenum, per-language pattern packsbackend/app/services/voice/voice_llm_orchestrator.py— tool schemas, conversational_intent emission paths- ADR-0049: Thin Voice Architecture — rationale for the regex pre-filter as the safety-refusal gate
- ADR-0051: Agentic VoiceLLMOrchestrator — why the LLM owns the FALLTHROUGH path
- ADR-0052: Voice Language Locked at First Utterance — why
switch_languageis no longer emitted - LiveKit Agents Documentation — the runtime that hosts
voice_agentand reads the conversational_intent label - Voice Safety Architecture — how
SAFETY_REFUSALinteracts with the broader safety model (Stage 1 of the four-stage defence) - (Sacks, Schegloff & Jefferson 1974) Foundational on the terminal-class taxonomy of utterance types (greeting, farewell, repeat-request, off-topic) that the regex pre-filter classifies.