Local Setup & Testing

Credentials required (one total)

Key	Required?	What for
`OPENAI_API_KEY`	✅ yes	Voice answer LLM (`gpt-4.1-mini`) + conversational intent fallback (`gpt-4.1-nano`). Already required for the text channel — same key.
`ELEVENLABS_API_KEY`	❌ deferred (Phase A.2)	Nightly audio-loop eval
`DEEPGRAM_API_KEY`	❌ deferred (Phase A.2)	Nightly audio-loop eval
`RETELL_API_KEY`, `VAPI_API_KEY`	❌ deferred (Phase A.2)	SOTA vendor benchmark
LiveKit Cloud, Twilio SIP	❌ Phase B/C	Telephony

No new Docker services for Phase A. The voice channel reuses the existing docker-compose stack (postgres, redis, minio, ollama, keycloak, clickhouse, langfuse, backend).

Step 1 — pull latest master

cd ~/Development/zol-rag
git checkout master
git pull

Phase A merged on 2026-04-24 as commit 63f06a4. Your master should include the commits:

63f06a4 Merge pull request #67 from Tsunami-max/feat/voice-phase-a
15f09bc fix(voice): respect disclaimer flag, emit shape-compliance metric
351e0e6 docs(env): document voice channel settings in .env.example
db5409e feat(eval): voice eval runner + SOTA benchmark skeleton (Phase A T14-T17)
…18 more commits

Step 2 — start the Docker infrastructure

cd docker
docker compose up -d
docker compose ps

Expect all services healthy or running. Standardized ports per project convention:

Service	Host port
Backend (FastAPI)	`:8000`
Frontend (Vite)	`:4000`
PostgreSQL	`:5433`
Redis	`:6379`
MinIO API	`:9100`
MinIO Console	`:9101`
Keycloak	`:8081`

If you have an existing .env on the backend, skip to Step 3. Otherwise:

cp backend/.env.example backend/.env

Step 3 — configure voice settings in `backend/.env`

Edit backend/.env and set:

OPENAI_API_KEY=sk-your-real-key
VOICE_CHANNEL_ENABLED=true

All other voice defaults are sensible:

VOICE_RAG_LLM_MODEL=gpt-4.1-mini
VOICE_RAG_FALLBACK_CHAIN=[{"provider":"openai","model":"gpt-4.1-mini"},{"provider":"openai","model":"gpt-4.1-nano"},{"provider":"openai","model":"gpt-4.1"}]
VOICE_ESCALATION_CONFIDENCE_THRESHOLD=0.80
VOICE_CONVERSATIONAL_INTENT_LLM_MODEL=gpt-4.1-nano
VOICE_DISCLAIMER_ENABLED=true
VOICE_STT_AMBIGUITY_GUARDRAIL_ENABLED=true

Step 4 — start the backend

In one terminal:

cd backend
source venv/bin/activate
uvicorn app.main:app --reload --port 8000

The Swagger UI at http://localhost:8000/docs now shows the voice-channel fields on the QueryRequest and QueryResponse models.

Step 5 — smoke tests

Use curl or a REST client. For protected endpoints, obtain a JWT by logging in (Keycloak at http://localhost:8081 or the existing login flow).

Test A — happy-path `answered`

curl -s -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT" \
  -d '{"query":"Wat zijn de bezoekuren?","channel":"voice"}' | jq .

Expected response shape:

{
  "answer": "De bezoekuren zijn van maandag tot vrijdag, van twee uur tot acht uur 's avonds.",
  "conversational_intent": "answered",
  "target_language": null,
  "voice_shape_compliant": true,
  "conversation_id": "…",
  "citations": [ … ],
  "metrics": { … }
}

Test B — rule-based `farewell` short-circuit

curl -s -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT" \
  -d '{"query":"Bedankt, tot ziens.","channel":"voice"}' | jq '.conversational_intent, .answer'

Expected: conversational_intent="farewell", Dutch goodbye template. RAG never called.

Test C — rule-based `switch_language` with target

curl -s -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT" \
  -d '{"query":"Can we continue in English please?","channel":"voice"}' | jq '.conversational_intent, .target_language'

Expected: conversational_intent="switch_language", target_language="en".

Test D — STT-ambiguity guardrail fires

curl -s -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT" \
  -d '{"query":"Moet ik iets nemen tegen migraine?","channel":"voice"}' | jq '.conversational_intent, .answer'

Expected: conversational_intent="out_of_scope", Dutch handoff template. RAG never called. This is the first-layer defense.

Test E — feature flag off returns HTTP 400

Flip VOICE_CHANNEL_ENABLED=false in .env, restart uvicorn. The same Test A curl now returns:

{"detail":"Voice channel is disabled (voice_channel_enabled=False)."}

Flip back to true to continue.

Test F — public WebSocket (no auth)

# wscat or websocat
wscat -c ws://localhost:8000/ws/public-query
> {"query":"Wat zijn de bezoekuren?","channel":"voice"}

Expected three JSON messages in sequence: conversational_intent, chunk, final. Rate limiting inherits from the existing public WebSocket guard.

Step 6 — verify the full voice test suite

cd backend
source venv/bin/activate
pytest tests/unit/services/voice/ tests/unit/test_voice_config.py \
       tests/unit/test_voice_prompt_builder.py tests/unit/test_voice_disclaimers.py \
       tests/unit/models/test_schemas_voice.py \
       tests/unit/services/test_llm_fallback_chain_voice.py \
       tests/unit/metrics/ tests/evaluation/ \
       tests/integration/test_endpoints_channel_forwarding.py -v --no-cov

Expected: 153 passed, 1 skipped. The skip is an auth-bypass integration test explicitly documented.

Step 7 — run the voice evaluator against your live orchestrator

This runs the 30-question seed set end-to-end, exercising every voice service:

cd backend
source venv/bin/activate
python - <<'PY'
import asyncio
from uuid import uuid4
from app.config import get_settings
from app.evaluation.voice_evaluator import VoiceEvaluator
from app.services.rag_service import RAGService
from app.services.voice.voice_llm_orchestrator import VoiceLLMOrchestrator
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

async def main():
    settings = get_settings()
    engine = create_async_engine(settings.database_url)
    async with AsyncSession(engine) as db:
        # The legacy `VoiceOrchestrator` was deleted in commit 158d793 (2026-05-02).
        # See ADR-0049 (thin pipeline) and ADR-0051 (agentic orchestrator).
        orch = VoiceLLMOrchestrator(
            settings=settings,
            rag_service=RAGService(db),
        )
        report = await VoiceEvaluator(subset_size=5).run(orch)
        print(f"Pass rate:     {report.pass_rate:.2%}")
        print(f"TTFT P50:      {report.ttft_p50_ms:.0f} ms")
        print(f"TTFT P95:      {report.ttft_p95_ms:.0f} ms")
        print(f"Failures by:   {report.failures_by_intent}")

asyncio.run(main())
PY

On a laptop with a healthy OpenAI connection, 5-question pass rate should be 80–100 % and P50 under ~1 500 ms (single-envelope non-streaming call; real TTFT comes with Phase B streaming).

Step 8 — observability

curl -s http://localhost:8000/metrics | grep -E "rag_query_ttft_ms|rag_query_conversational_intent|rag_voice_safety_escalations|rag_voice_shape_compliance"

Expect histograms and counters with channel="voice" label. Grafana at http://localhost:3000 (if you run the pilot compose) will show these series under the "voice" label automatically.

Troubleshooting

"Voice channel is disabled" on every request

Check that VOICE_CHANNEL_ENABLED=true is in backend/.env and that you restarted uvicorn after changing the file. The setting is read at process start.

Test fails with `OPENAI_API_KEY not configured`

VoiceLLMOrchestrator calls OpenAI's GPT-4.1 directly for the agentic tool-call loop. Set OPENAI_API_KEY in .env or skip those tests. (The previous "LLM-fallback resolver" — conversational_intent_resolver.py — was deleted in commit 158d793 along with the rest of the legacy 8-stage pipeline; see ADR-0049 and ADR-0051.)

Conversational intent always comes back as `answered`

In the thin pipeline, conversational_intent is derived from (a) classify_terminal()'s TerminalClass enum (FAREWELL/HANDOFF_REQUEST/SAFETY_REFUSAL/REPEAT_REQUEST), and (b) the GPT-4.1 tool choice (end_call → farewell, transfer_to_helpdesk → escalate); everything else defaults to answered. To exercise non-answered paths, ask a clear farewell ("dag, tot ziens") or an explicit transfer request ("ik wil iemand spreken").

Unit tests fail on `app.services.voice.*` imports

PYTHONPATH issue — the app package must be importable. Run pytest from backend/ with the venv activated (source venv/bin/activate), not from the repo root.

Quick reset / disable

To disable voice quickly without restarting:

# Edit backend/.env, flip VOICE_CHANNEL_ENABLED=false, restart uvicorn.
# All voice requests immediately return HTTP 400 again.

To rebuild the backend Docker image with the voice code baked in (for demos without --reload):

cd docker
docker compose build backend
docker compose up -d backend

Credentials required (one total)​

Step 1 — pull latest master​

Step 2 — start the Docker infrastructure​

Step 3 — configure voice settings in backend/.env​

Step 4 — start the backend​

Step 5 — smoke tests​

Test A — happy-path answered​

Test B — rule-based farewell short-circuit​

Test C — rule-based switch_language with target​

Test D — STT-ambiguity guardrail fires​

Test E — feature flag off returns HTTP 400​

Test F — public WebSocket (no auth)​

Step 6 — verify the full voice test suite​

Step 7 — run the voice evaluator against your live orchestrator​

Step 8 — observability​

Troubleshooting​

"Voice channel is disabled" on every request​

Test fails with OPENAI_API_KEY not configured​

Conversational intent always comes back as answered​

Unit tests fail on app.services.voice.* imports​

Quick reset / disable​