Skip to main content

Conversational Intent

The voice channel emits one of five conversational_intent values on every turn — answered, farewell, escalate, out_of_scope, or (for backwards compatibility only) switch_language. This routing label is what the voice_agent reads to decide whether to keep the session open, hand off via SIP REFER, or hang up. This page documents how that label is derived in the current architecture (post-158d793).

The two mechanisms

There is no longer a dedicated "conversational intent resolver" service. Conversational intent is derived from two mechanisms inside VoiceLLMOrchestrator:

Mechanism 1 — classify_terminal regex pre-filter

Module: backend/app/services/voice/voice_thin_pre_filter.py, function classify_terminal().

Every caller utterance is run through a deterministic regex classifier before the LLM is invoked. The classifier returns one of seven TerminalClass values, in this precedence order (highest first):

ClassMaps to conversational_intentTriggered by
SAFETY_REFUSALout_of_scopeMedical-advice phrasings (dosage, prescription, "what should I take")
HANDOFF_REQUESTescalateExplicit human-transfer requests ("verbind me door", "speak to an agent")
REPEAT_REQUESTanswered (with prior-answer re-emit)Caller asks for repetition ("kun je dat herhalen")
OFF_TOPIC_PERSONALanswered (with redirect)Personal-life or off-topic queries unrelated to the hospital
FAREWELLfarewellCaller says goodbye ("tot ziens", "doei", "bedankt")
GREETINGanswered (with greeting echo)Caller opens with hello but no question yet
FALLTHROUGH(LLM stage decides)Substantive query — handed to GPT-4.1

Precedence ordering matters. A caller who says "Bedankt voor de informatie, kunt u me toch nog even doorverbinden?" must be classified as HANDOFF_REQUEST (escalate), not FAREWELL (which would hang up before the transfer). The cascade is structured so safety-critical intents (refusal, handoff, repeat) are evaluated before social ones (farewell, greeting).

Mechanism 2 — GPT-4.1 tool-choice

Module: backend/app/services/voice/voice_llm_orchestrator.py, class VoiceLLMOrchestrator.

When classify_terminal returns FALLTHROUGH, the orchestrator invokes GPT-4.1 with a three-tool schema. The agent's tool choice maps directly to the conversational intent:

Tool the LLM callsconversational_intent returned
end_call (only after explicit goodbye marker)farewell
transfer_to_helpdeskescalate
search_hospital_kb (one or more times) → final answeranswered
No tool (LLM answers directly)answered

Two safeguards keep the LLM honest about when to terminate:

  1. end_call requires explicit goodbye. The system prompt instructs "End the call ONLY after the caller has spoken an explicit goodbye marker — for example 'tot ziens', 'dag', 'doei'." See voice_llm_orchestrator.py:149–154. This prevents the LLM from hanging up on a "Bedankt voor de informatie" mid-conversation.
  2. Two consecutive empty searches force escalation. When search_hospital_kb returns found=False twice in a row, the orchestrator force-transfers to the helpdesk with conversational_intent="escalate" (voice_llm_orchestrator.py:519–556). This defends against the gibberish-rephrase loop pattern from the 2026-05-07 traffic.

Agent-side gate — emphasis-only affirmations no longer carry context

There is a small third classifier that the diagram above doesn't show because it sits on the agent side, not in VoiceLLMOrchestrator: _is_short_clarification() in voice_agent/agent.py:454. Its job is to decide whether a short caller utterance should re-use the previous turn's question as context (the FAQ-followup carry) or be sent to the backend as a fresh utterance.

Commit eb5ebf66 (2026-05-26) added a deny-list — _EMPHASIS_ONLY_NL_EN in voice_agent/agent.py:446 — for bare emphasis affirmations in Dutch and English: "ja", "ja graag", "graag", "ja alsjeblieft", "alsjeblieft", "yes", "yes please", "please", "yes thank you". Even though these look like "short clarification" turns, they carry no new content to combine with the previous question. Sending them to the backend with the previous question prepended (the old behaviour) produced stitched compound queries that confused the LLM. The fix lets each such utterance fall through to the LLM as-is, so the LLM can ask a real follow-up question or move on. Pilot trace 92e11ea3 turn 12 is the regression case.

This is purely an agent-side concern — it does not change which conversational_intent value the orchestrator returns. It's documented here because it changes what reaches the two mechanisms above.

Trade-offs of the current design

The current two-mechanism model replaced a three-tier hybrid resolver (rules → confident-classifier short-circuit → gpt-4.1-nano LLM fallback) that was deleted in commit 158d793. The trade-offs of the current design vs the alternatives:

DecisionChosenAlternativeRejected because
Where intent is decidedRegex pre-filter for terminal classes; GPT-4.1 itself for FALLTHROUGHDedicated gpt-4.1-nano LLM resolver before the main GPT-4.1 callTwo LLM calls per turn was the dominant cost in latency profiling; the main GPT-4.1 already has the conversational context to decide tool dispatch correctly without a second model.
Regex vs LLM for safety-refusalRegex pre-filterLLM-only refusalA recognised dosage ask must NEVER reach the LLM (safety invariant). Regex pre-filter is the only mechanism that can guarantee zero LLM exposure on the safety-critical class. See Voice Safety Architecture.
Single intent labelOne label per turnMulti-label (medical-intent + conversational-intent in the response)Medical intent is computed inside RAGService (intent classifier service) and used internally for ranking and prompt selection; the voice-agent only needs one routing label. Surfacing two labels added cognitive load with no observable benefit.

Compare with the deleted three-tier resolver: the latency model was "~40% rule match (< 1 ms), ~50% confident-classifier short-circuit (0 ms), ~10% LLM fallback (~150–300 ms)". In the current architecture the dominant latency factor is the GPT-4.1 first-token (300–600 ms) — but that call is already on the critical path for any FALLTHROUGH query. There is no additional resolver call.

Backwards-compatibility note: switch_language

The conversational_intent value switch_language is preserved in the response schema for backwards compatibility (existing voice_agent versions still parse it), but no current orchestrator path emits it. ADR-0052 removed mid-call language switching: the language is locked at first utterance, and a request to switch mid-call is handled inside the LLM stage as a polite decline plus optional helpdesk transfer (not as a separate intent label).

The dangling enum value is documented for frontends still listing it, but new code should never compare against it.

Operator signal

Each turn logs the resolved intent at INFO so operators can spot misclassifications from logs alone:

[VoiceLLMOrchestrator] turn_complete intent=answered tool_calls=1 latency_ms=842

HANDOFF_REQUEST should match for handoff-seeking utterances; SAFETY_REFUSAL should match for medical-advice phrasings. A regression on the regex pre-filter surfaces as a spike in out_of_scope or escalate false-negatives on the voice golden seed run.

Contract test

backend/tests/integration/services/voice/test_voice_llm_orchestrator_integration.py pins the contract that voice_agent's detected_language parameter is honoured by the orchestrator, and that the orchestrator never emits conversational_intent="switch_language" on the current path:

async def test_detected_language_from_voice_agent_is_respected_by_orchestrator(
orchestrator, make_request
):
request = make_request(query="Parkeren bij ZOL", detected_language="nl")
response = await orchestrator.query_stream(request).__anext__()
assert response.detected_language == "nl"
assert response.conversational_intent != "switch_language"

This is the R3 cross-component contract test per CLAUDE.md §Silent-Failure Discipline — a test that simulates the wire format and asserts both sides agree.

References

  • backend/app/services/voice/voice_thin_pre_filter.pyclassify_terminal, TerminalClass enum, per-language pattern packs
  • backend/app/services/voice/voice_llm_orchestrator.py — tool schemas, conversational_intent emission paths
  • ADR-0049: Thin Voice Architecture — rationale for the regex pre-filter as the safety-refusal gate
  • ADR-0051: Agentic VoiceLLMOrchestrator — why the LLM owns the FALLTHROUGH path
  • ADR-0052: Voice Language Locked at First Utterance — why switch_language is no longer emitted
  • LiveKit Agents Documentation — the runtime that hosts voice_agent and reads the conversational_intent label
  • Voice Safety Architecture — how SAFETY_REFUSAL interacts with the broader safety model (Stage 1 of the four-stage defence)
  • (Sacks, Schegloff & Jefferson 1974) Foundational on the terminal-class taxonomy of utterance types (greeting, farewell, repeat-request, off-topic) that the regex pre-filter classifies.