ADR-0051: Agentic `VoiceLLMOrchestrator` is the Only Voice Path

Master record: docs/ADR/0051-agentic-only-voice-orchestrator.md. The master is canonical; this Docusaurus rendering is for in-site navigation.

Date: 2026-05-07 Status: Accepted Deciders: Tsunami-max (operator), Claude (implementation) Supersedes: ADR-0049 (Thin Voice Architecture) Relates to: clarifying-questions decisions, ADR-0050 (master record) (Twilio + self-hosted LiveKit SIP), the agentic orchestrator spec.

Context

The voice channel carried two orchestrators since 2026-05-06:

ThinVoiceOrchestrator (ADR-0049, 2026-04-30) — deterministic pipeline: regex pre-filter → FAQ → RAG. Predictable, fast in the happy path, but brittle on anything outside its hand-coded routing tables.
VoiceLLMOrchestrator (greenfield 2026-05-06) — agentic GPT-4.1 with tool use: search_hospital_kb, transfer_to_helpdesk, switch_language, end_call. Routing decisions made by the LLM on the actual user utterance.

Routing was gated on voice_llm_orchestrator_enabled (default False), intended as an A/B comparison flag. In practice, two issues emerged:

Drift between the two paths. The thin orchestrator and the agentic orchestrator each accumulated their own copies of helpers (_resolve_language lived in both files), prompt fragments, and answer-shaping rules. Bug fixes had to be applied twice or the wrong path silently regressed.
Pilot 2026-05-07 transcript surfaced rigidity that an LLM router handles natively. A caller said "Do you speak English?" in English. Deepgram (Dutch-locked) phonetically mapped this to "Duursteking licht spelen" — gibberish. The thin pipeline's deterministic routing has no concept of "this transcript is nonsense — escalate or switch language"; it routed the gibberish to RAG, which produced the dead-end "Ik kon deze informatie niet terugvinden" answer 45 seconds later. The caller hung up. The agentic orchestrator's tool-calling gives the LLM the same option we'd want a human dispatcher to have ("garbage input → call transfer_to_helpdesk") without us hand-coding every input shape.

Decision

The voice channel uses VoiceLLMOrchestrator exclusively. The deterministic thin pipeline is deleted from the codebase (the ThinVoiceOrchestrator class file, the integration E2E suite, the unit tests of the orchestrator's internal methods).

Shared pure functions (_is_compound_query, _is_list_all_query, _query_has_specificity_hint, _looks_like_bare_affirmation_after_open_closer) move into voice_thin_pre_filter.py because both rag_service and the agentic orchestrator import them. The file keeps its current name as a follow-up cleanup item; the _thin_ prefix is now misleading but renaming is a wide import edit that doesn't carry weight on its own.

The settings flag voice_llm_orchestrator_enabled is downgraded to a no-op kept only so existing .env files don't trip Pydantic's extra="forbid" validator. It will be removed in a follow-up cleanup.

Consequences

Positive

Single source of truth for voice behavior. One orchestrator, one prompt, one tool table. Bug fixes apply once.
LLM-driven recovery on bad input. Gibberish, code-switched speech, and language requests now route through the same decision layer that handles substantive queries.
Cleaner ops surface. The ENABLED=true/false knob no longer exists in any meaningful sense — there is no "fall back to thin" reflex during a deploy mishap.
~7 000 lines deleted (orchestrator + E2E suite + duplicated unit tests).

Negative / Risks

Cost floor rises. Every voice turn now invokes GPT-4.1; the thin pipeline could short-circuit greetings/farewells/FAQ matches without an LLM call. Mitigation: the regex pre-filter (classify_terminal) still runs before the LLM and short-circuits greeting/farewell/handoff/safety classes. The LLM only fires when the pre-filter returns FALLTHROUGH — the same gate the agentic orchestrator already used.
No fallback if GPT-4.1 endpoint is down. Mitigation: existing failure path (orchestrator returns conversational_intent="escalate" on any tool/LLM exception, voice_agent SIP-transfers).
Iteration loops on poor input. Observed in the pilot transcript: the LLM called search_hospital_kb 7× on gibberish before giving up. Mitigation: a max_iterations=3 cap and empty-result early-exit land in the same batch as this ADR (separate commits for review granularity).

Implementation summary

File	Change
`app/services/voice/voice_thin_orchestrator.py`	Deleted
`tests/unit/test_voice_thin_orchestrator.py`	Deleted
`tests/integration/test_voice_thin_e2e.py`	Deleted
`app/services/voice/voice_thin_pre_filter.py`	Absorbed 4 shared helpers from the deleted orchestrator
`app/services/rag_service.py`	Import path updated to `voice_thin_pre_filter`
`app/api/query.py`	Two routing branches collapsed to a single `VoiceLLMOrchestrator` call
`app/api/public_websocket.py`	Same — single orchestrator path
`app/config.py`	Flag marked deprecated / no-op; default flipped to `True`
`.env.example`	Comment updated to reference ADR-0051
`app/prompts.py`	One docstring updated

Follow-ups

Rename voice_thin_pre_filter.py → voice_pre_filter.py (or similar channel-agnostic name). Mechanical edit; deferred to keep this commit reviewable.
Remove the voice_llm_orchestrator_enabled setting once enough deploy cycles confirm no .env still references it.
Add an evaluation harness for voice (recorded calls + LLM-as-judge scoring per turn). The agentic system gives us more axes of failure than the thin one — without eval coverage, regressions surface only via live calls.

References

ADR-0049 (superseded): the deterministic thin pipeline this ADR replaces.
ADR-0050 (master): Twilio + self-hosted LiveKit SIP — the runtime that hosts this orchestrator.
LiveKit Agents documentation — framework reference.
Pydantic AI documentation — structured-output pattern used by the orchestrator's tool dispatch.

Context​

Decision​

Consequences​

Positive​

Negative / Risks​

Implementation summary​

Follow-ups​

References​