Voice Language Locking

ADR: ADR-0052 Date: 2026-05-07 Status: Accepted

Deepgram Nova-3 is the production STT model; this page documents how the voice channel adapts to its single-language vs multi-language operating modes.

The problem

ADR-0051 added a switch_language tool to the agentic orchestrator and a regex fast-path for mid-call language detection. Two pilot calls within days proved the design structurally broken:

Conv 5c81a578 — Caller said "Do you speak English?" in English. Deepgram (configured for Dutch/NL) phonetically hallucinated "Duursteking licht spelen" — no English phoneme coverage. The fast-path regex eventually matched, but the LLM loop had already run for 47 seconds before the switch fired.

Conv fb4b4bae — Caller said "Do you speak English?" mid-call, after 8 successful Dutch turns. Deepgram in single-language mode produced no transcript at all for the English speech — not gibberish, not a partial transcription, nothing. The fast-path could not fire because there was no text for the regex to match. The caller waited 30 seconds in silence and hung up.

The second case is structural: once Deepgram is configured for a single language at the STT layer, it silently emits zero transcripts on speech in any other language. There is no signal — neither garbage text nor an error event — that any downstream detector (regex, lexical, or LLM) can act on.

Multi-language Deepgram mode exists and would solve the silence problem, but it badly degrades Flemish accuracy. Prior team measurements showed Dutch queries going from "Wat zijn de bezoekuren" to "Hå at zen de bezukjuren" under multi-language mode. The trade-off costs every Dutch caller (95%+ of calls) to handle a <1% mid-call-switch case.

Deepgram operating-point comparison

The team evaluated three Deepgram operating points before the decision. The trade-off table below records what was observed (qualitative; numerics from prior team measurements which would benefit from a structured re-measurement pass — flagged as * markers below).

Operating point	Flemish (nl-BE) accuracy	Cross-language coverage	Mid-call switching
Single-language (`nl`)	High* — `"Wat zijn de bezoekuren"` recognised cleanly	Zero — silent on English / French / Italian speech (no transcript emitted at all)	Impossible — no signal for any detector to act on
Multi-language (`multi`)	Degraded — `"Wat zijn de bezoekuren"` → `"Hå at zen de bezukjuren"` (prior team measurement)	Yes — English / French / Italian transcribed	Possible — both languages parsed
First-utterance probe + lock	High* — same as single-language for 99%+ of the call	Detected once at call start; locked thereafter	Not supported (deliberate trade-off)

The numbers above merit an updated measurement pass against the current Deepgram Nova-3 (@deepgram_nova3) build; the qualitative ranking has held in pilot calls.

The decision (ADR-0052)

The voice channel's language is locked at the first utterance for the duration of the call. Mid-call language switching is not supported.

The voice_agent worker uses multi-language STT for the very first utterance only, detects the caller's actual spoken language, then locks Deepgram to that language for the remainder of the call. If the caller asks to switch language mid-call, the agent's response is a polite transfer offer:

"Ik kan u helaas niet omschakelen naar een andere taal. Ik kan u wel doorverbinden met een medewerker die u kan helpen. Wilt u dat ik dat doe?"

(Translated: "Unfortunately I cannot switch to another language. I can transfer you to a colleague who can help. Would you like me to do that?")

Implementation

The lock is owned entirely by voice_agent — the backend never changes language state, it only reads it:

voice_agent._current_language = None          # before first utterance
voice_agent.probe_first_utterance()           # multi-language STT
  → detect spoken language
  → voice_agent._switch_language(detected)   # reconfigure STT + TTS once
  → voice_agent._current_language = "nl"     # locked for call duration

# Every subsequent turn:
voice_agent → backend: QueryRequest{detected_language: "nl", ...}
backend VoiceLLMOrchestrator: reads detected_language, never modifies it

QueryRequest.detected_language carries the locked language on every backend call. The orchestrator is language-aware (uses it to select the right tenant FAQ overlay and voice system prompt variant) but has no mechanism to change it.

What was removed

The following were deleted in the ADR-0052 commit:

Artifact	Why removed
`switch_language` tool from `VoiceLLMOrchestrator._TOOLS`	Language switching is no longer an orchestrator responsibility
Mid-call regex fast-path (`_detect_language_request`, `_LANG_REQUEST_PATTERNS`)	Covered a case that Deepgram's silence makes undetectable
In-loop `switch_language` short-circuit handling	No tool to dispatch to
`switch_language` system-prompt tool description	Removed from tool list
`TestLanguageFastPath` unit-test class	Tests the deleted path
`test_voice_llm_language_fast_path.py`	Deleted entirely
2 integration tests for `switch_language` tool success + invalid-code paths	Deleted

Net: ~80 LOC removed, 4 tools → 3 tools (search_hospital_kb, transfer_to_helpdesk, end_call remain; switch_language is gone).

What was preserved

Artifact	Why kept
`voice_agent.language_detection.detect_language_request`	Still used for the first-utterance probe
`voice_agent.agent._switch_language()`	Called once at call start by the probe
`conversational_intent` enum value `switch_language`	Backwards compatibility; no orchestrator path emits it currently
System-prompt instruction to politely decline and offer transfer on mid-call switch requests	The agent still needs to handle the request gracefully

Hospital-agnostic parameterisation

The lock-and-stay policy is universal — it applies to every tenant regardless of their supported language set. A single-language tenant (ZOL: Dutch) and a multi-language tenant (e.g., a medical-tourism hospital: nl/fr/en/it) both use the lock. The difference is what language the first-utterance probe detects and locks to.

Tenant language configuration is via get_taxonomy(slug) — same DB-driven path as all other tenant data. No language hardcoding in source.

If a future tenant requires genuine mid-call language switching (bilingual conversations are common in their population), this ADR will need revisiting. The options at that point are: Deepgram multi-language mode with a per-tenant Flemish-accuracy trade-off accepted, or an alternative STT vendor that handles code-switching natively.

Contract test

backend/tests/integration/services/voice/test_voice_llm_orchestrator_integration.py — the contract test that pins the language plumbing across the voice_agent → backend handoff:

async def test_detected_language_from_voice_agent_is_respected_by_orchestrator(
    orchestrator, make_request
):
    """voice_agent locks language at first utterance and sends
    detected_language on every turn. The orchestrator must read and
    use it — never infer or override."""
    request = make_request(query="Parkeren bij ZOL", detected_language="nl")
    response = await orchestrator.query_stream(request).__anext__()
    assert response.detected_language == "nl"
    # No switch_language tool call in the response conversational_intent
    assert response.conversational_intent != "switch_language"

References

ADR-0052: Voice Language Locked at First Utterance
ADR-0051: Agentic VoiceLLMOrchestrator (the ADR this one supersedes on language switching)
backend/app/services/voice/voice_llm_orchestrator.py — tools list (3 tools: search_hospital_kb, transfer_to_helpdesk, end_call)
Deepgram Nova-3 — production STT model; the single-language vs multi-language trade-off is the empirical input to this ADR
Pilot call transcripts: conv 5c81a578 (47s gibberish loop), conv fb4b4bae (30s silent hang-up) — both in backend/app/services/voice/ log archive
Radford et al. 2023
{/* TODO Wave 2.D: bibkey for "code-switching ASR survey" needed (foundational on multi-language STT degradation) */}

The problem​

Deepgram operating-point comparison​

The decision (ADR-0052)​

Implementation​

What was removed​

What was preserved​

Hospital-agnostic parameterisation​

Contract test​

References​