Skip to main content

Local Setup & Testing

Credentials required (one total)

KeyRequired?What for
OPENAI_API_KEY✅ yesVoice answer LLM (gpt-4.1-mini) + conversational intent fallback (gpt-4.1-nano). Already required for the text channel — same key.
ELEVENLABS_API_KEY❌ deferred (Phase A.2)Nightly audio-loop eval
DEEPGRAM_API_KEY❌ deferred (Phase A.2)Nightly audio-loop eval
RETELL_API_KEY, VAPI_API_KEY❌ deferred (Phase A.2)SOTA vendor benchmark
LiveKit Cloud, Twilio SIP❌ Phase B/CTelephony

No new Docker services for Phase A. The voice channel reuses the existing docker-compose stack (postgres, redis, minio, ollama, keycloak, clickhouse, langfuse, backend).

Step 1 — pull latest master

cd ~/Development/zol-rag
git checkout master
git pull

Phase A merged on 2026-04-24 as commit 63f06a4. Your master should include the commits:

63f06a4 Merge pull request #67 from Tsunami-max/feat/voice-phase-a
15f09bc fix(voice): respect disclaimer flag, emit shape-compliance metric
351e0e6 docs(env): document voice channel settings in .env.example
db5409e feat(eval): voice eval runner + SOTA benchmark skeleton (Phase A T14-T17)
…18 more commits

Step 2 — start the Docker infrastructure

cd docker
docker compose up -d
docker compose ps

Expect all services healthy or running. Standardized ports per project convention:

ServiceHost port
Backend (FastAPI):8000
Frontend (Vite):4000
PostgreSQL:5433
Redis:6379
MinIO API:9100
MinIO Console:9101
Keycloak:8081

If you have an existing .env on the backend, skip to Step 3. Otherwise:

cp backend/.env.example backend/.env

Step 3 — configure voice settings in backend/.env

Edit backend/.env and set:

OPENAI_API_KEY=sk-your-real-key
VOICE_CHANNEL_ENABLED=true

All other voice defaults are sensible:

VOICE_RAG_LLM_MODEL=gpt-4.1-mini
VOICE_RAG_FALLBACK_CHAIN=[{"provider":"openai","model":"gpt-4.1-mini"},{"provider":"openai","model":"gpt-4.1-nano"},{"provider":"openai","model":"gpt-4.1"}]
VOICE_ESCALATION_CONFIDENCE_THRESHOLD=0.80
VOICE_CONVERSATIONAL_INTENT_LLM_MODEL=gpt-4.1-nano
VOICE_DISCLAIMER_ENABLED=true
VOICE_STT_AMBIGUITY_GUARDRAIL_ENABLED=true

Step 4 — start the backend

In one terminal:

cd backend
source venv/bin/activate
uvicorn app.main:app --reload --port 8000

The Swagger UI at http://localhost:8000/docs now shows the voice-channel fields on the QueryRequest and QueryResponse models.

Step 5 — smoke tests

Use curl or a REST client. For protected endpoints, obtain a JWT by logging in (Keycloak at http://localhost:8081 or the existing login flow).

Test A — happy-path answered

curl -s -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $JWT" \
-d '{"query":"Wat zijn de bezoekuren?","channel":"voice"}' | jq .

Expected response shape:

{
"answer": "De bezoekuren zijn van maandag tot vrijdag, van twee uur tot acht uur 's avonds.",
"conversational_intent": "answered",
"target_language": null,
"voice_shape_compliant": true,
"conversation_id": "…",
"citations": [],
"metrics": {}
}

Test B — rule-based farewell short-circuit

curl -s -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $JWT" \
-d '{"query":"Bedankt, tot ziens.","channel":"voice"}' | jq '.conversational_intent, .answer'

Expected: conversational_intent="farewell", Dutch goodbye template. RAG never called.

Test C — rule-based switch_language with target

curl -s -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $JWT" \
-d '{"query":"Can we continue in English please?","channel":"voice"}' | jq '.conversational_intent, .target_language'

Expected: conversational_intent="switch_language", target_language="en".

Test D — STT-ambiguity guardrail fires

curl -s -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $JWT" \
-d '{"query":"Moet ik iets nemen tegen migraine?","channel":"voice"}' | jq '.conversational_intent, .answer'

Expected: conversational_intent="out_of_scope", Dutch handoff template. RAG never called. This is the first-layer defense.

Test E — feature flag off returns HTTP 400

Flip VOICE_CHANNEL_ENABLED=false in .env, restart uvicorn. The same Test A curl now returns:

{"detail":"Voice channel is disabled (voice_channel_enabled=False)."}

Flip back to true to continue.

Test F — public WebSocket (no auth)

# wscat or websocat
wscat -c ws://localhost:8000/ws/public-query
> {"query":"Wat zijn de bezoekuren?","channel":"voice"}

Expected three JSON messages in sequence: conversational_intent, chunk, final. Rate limiting inherits from the existing public WebSocket guard.

Step 6 — verify the full voice test suite

cd backend
source venv/bin/activate
pytest tests/unit/services/voice/ tests/unit/test_voice_config.py \
tests/unit/test_voice_prompt_builder.py tests/unit/test_voice_disclaimers.py \
tests/unit/models/test_schemas_voice.py \
tests/unit/services/test_llm_fallback_chain_voice.py \
tests/unit/metrics/ tests/evaluation/ \
tests/integration/test_endpoints_channel_forwarding.py -v --no-cov

Expected: 153 passed, 1 skipped. The skip is an auth-bypass integration test explicitly documented.

Step 7 — run the voice evaluator against your live orchestrator

This runs the 30-question seed set end-to-end, exercising every voice service:

cd backend
source venv/bin/activate
python - <<'PY'
import asyncio
from uuid import uuid4
from app.config import get_settings
from app.evaluation.voice_evaluator import VoiceEvaluator
from app.services.rag_service import RAGService
from app.services.voice.voice_llm_orchestrator import VoiceLLMOrchestrator
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

async def main():
settings = get_settings()
engine = create_async_engine(settings.database_url)
async with AsyncSession(engine) as db:
# The legacy `VoiceOrchestrator` was deleted in commit 158d793 (2026-05-02).
# See ADR-0049 (thin pipeline) and ADR-0051 (agentic orchestrator).
orch = VoiceLLMOrchestrator(
settings=settings,
rag_service=RAGService(db),
)
report = await VoiceEvaluator(subset_size=5).run(orch)
print(f"Pass rate: {report.pass_rate:.2%}")
print(f"TTFT P50: {report.ttft_p50_ms:.0f} ms")
print(f"TTFT P95: {report.ttft_p95_ms:.0f} ms")
print(f"Failures by: {report.failures_by_intent}")

asyncio.run(main())
PY

On a laptop with a healthy OpenAI connection, 5-question pass rate should be 80–100 % and P50 under ~1 500 ms (single-envelope non-streaming call; real TTFT comes with Phase B streaming).

Step 8 — observability

curl -s http://localhost:8000/metrics | grep -E "rag_query_ttft_ms|rag_query_conversational_intent|rag_voice_safety_escalations|rag_voice_shape_compliance"

Expect histograms and counters with channel="voice" label. Grafana at http://localhost:3000 (if you run the pilot compose) will show these series under the "voice" label automatically.

Troubleshooting

"Voice channel is disabled" on every request

Check that VOICE_CHANNEL_ENABLED=true is in backend/.env and that you restarted uvicorn after changing the file. The setting is read at process start.

Test fails with OPENAI_API_KEY not configured

VoiceLLMOrchestrator calls OpenAI's GPT-4.1 directly for the agentic tool-call loop. Set OPENAI_API_KEY in .env or skip those tests. (The previous "LLM-fallback resolver" — conversational_intent_resolver.py — was deleted in commit 158d793 along with the rest of the legacy 8-stage pipeline; see ADR-0049 and ADR-0051.)

Conversational intent always comes back as answered

In the thin pipeline, conversational_intent is derived from (a) classify_terminal()'s TerminalClass enum (FAREWELL/HANDOFF_REQUEST/SAFETY_REFUSAL/REPEAT_REQUEST), and (b) the GPT-4.1 tool choice (end_callfarewell, transfer_to_helpdeskescalate); everything else defaults to answered. To exercise non-answered paths, ask a clear farewell ("dag, tot ziens") or an explicit transfer request ("ik wil iemand spreken").

Unit tests fail on app.services.voice.* imports

PYTHONPATH issue — the app package must be importable. Run pytest from backend/ with the venv activated (source venv/bin/activate), not from the repo root.

Quick reset / disable

To disable voice quickly without restarting:

# Edit backend/.env, flip VOICE_CHANNEL_ENABLED=false, restart uvicorn.
# All voice requests immediately return HTTP 400 again.

To rebuild the backend Docker image with the voice code baked in (for demos without --reload):

cd docker
docker compose build backend
docker compose up -d backend