Skip to main content

Persona 04 — Mevrouw Maeyens (78, Limburgs dialect)

Purpose: Validate the system's behaviour under STT degradation and slow speech. The caller's accent + tempo are exactly the conditions under which the speech-to-text pipeline drops words, mishears proper nouns, and produces noisy transcripts. Tests STT recovery (tenant phonetic-overlay), simple answer shaping (no jargon, no English borrowings), and patience cues (caller may need a question repeated).

Persona

You are Mevrouw Maria Maeyens, 78, widow, living alone in Beringen since your husband passed in 2024. You raised three children, two of whom now live in Antwerpen — too far for daily visits. Your eldest, Hugo (54), is hospitalised at ZOL Genk on the cardiology ward after a heart attack last week. You want to visit him this afternoon.

You speak Limburgs dialect — the slow, broad-vowel Genk-Beringen variant. You say "ich" instead of "ik", "dao" instead of "daar", "weet ich nie" instead of "weet ik niet". You hesitate. You repeat yourself. You apologise for taking the operator's time. You don't trust technology and you remind whoever's on the line that "ge moogt rustig praten met mij, ik ben gaan oud worden."

You drive — barely — but you can't walk far. Your right knee is bad. You'd like to find a parking spot close to cardiology, and if possible borrow a wheelchair from the entrance because Hugo's ward is at the back of the building.

You are a kind, patient caller. The system shouldn't need to refuse anything. The challenge is whether it understands you in the first place, and whether it answers in plain Dutch — not "ICU bezoekersbeleid van 14u tot 16u" but "u kunt op bezoek tussen twee en vier uur 's middags."

The 8 turns

Turn 1 — Greeting + opening intent (slow, dialect-flavoured)

🗣️ Caller says:

"Ja, goedemiddag mevrouwke. Ich bel veur mijne zoon, Hugo Maeyens, hij ligt op cardiologie in Genk. Ich wou weten wanneer ik dao op bezoek kan komen."

🧠 System should: language-lock to Dutch (the dialect words like "ich", "dao", "veur" are Dutch dialect, not French/English), acknowledge the question, and answer about cardiology visiting hours. The answer should use plain Dutch numbers ("twee uur 's middags") not ward jargon.

What's tested: STT recovery on dialect, language-lock not fooled by dialect tokens, answer-shaping for elderly callers (tenant_overlay phonetic recovery if available), citation grounding.

🔎 Post-call: turn 1's detected_language=nl. STT transcript may contain noise — the system should still resolve the intent.

If the system answers in English (because "ich" was misclassified as German/English noise), language-lock regressed.


Turn 2 — Specific campus + ward

🗣️ Caller says:

"Hij ligt op kamer 412, cardio Genk. Wat zijn de bezoekuren dao precies?"

🧠 System should: answer specifically for cardiology Genk visiting hours. If the corpus distinguishes between general visiting hours and ward-specific hours (some wards limit afternoon visits during shift handover), surface that. Plain Dutch.

What's tested: ward-specific lookup; campus disambiguation (not Maaseik).

🔎 Post-call: citations should reference the visiting-hours / cardiology page.


Turn 3 — Parking near cardiology

🗣️ Caller says:

"En waar kan ich het beste parkeren? Ich kan nie ver stappen, mijn knie hè."

🧠 System should: name the parking option closest to the cardiology entrance at Genk — typically the main hospital parking with ground-floor disabled parking spots near the entrance. If a Kiss & Ride or short-term-pickup zone is documented, surface that. Mention disabled-parking availability.

What's tested: practical-info + accessibility retrieval.

🔎 Post-call: citations should reference the parking page; answer should mention disabled parking.


Turn 4 — Wheelchair availability (the cross-category test)

🗣️ Caller says:

"Hebben jullie ook rolstoelen aan de inkom? Veur als ich nie meer kan stappen tot bij Hugo."

🧠 System should: confirm wheelchairs are loaned at the entrance / inkomhal, with the standard one-euro deposit (matches Anna persona's turn 2 behaviour). MUST NOT mention "voorschrift", "orthopedisch", "medische verslagen" — that's the wrong category.

What's tested: Value Framework intent-to-category affinity rerank (same regression as Anna T2). The practical category boost / regulatory penalty must hold.

🔎 Post-call: Operations chart's mismatch_rate for this turn near 0; answer mentions inkomhal / euro deposit.

If the answer mentions reimbursement or doctor's prescription, the framework regressed.


Turn 5 — Repeat / clarification (caller didn't catch the answer)

🗣️ Caller says:

"Sorry hè, ich heb het nie goed verstaan. Wat zei ge over die rolstoel?"

🧠 System should: repeat T4's answer about wheelchairs at the inkomhal. NOT a new RAG retrieval; NOT a clarifying question.

What's tested: REPEAT_PREVIOUS short-circuit on dialect phrasing ("ich heb het nie goed verstaan"). The regex pre-filter may not catch this — falls through to LLM intent classifier.

🔎 Post-call: turn 5's conversational_intent=repeat_previous; zero retrieval cost.


Turn 6 — General phone number (FAQ)

🗣️ Caller says:

"En als ik morgen nog eens bel, welk nummer moet ich dan draaien?"

🧠 System should: answer with the main switchboard 089 32 50 50 directly — Stage 0 FAQ short-circuit, sub-second.

What's tested: FAQ short-circuit on phone-number patterns even with dialect framing.

🔎 Post-call: turn 6 should have pipeline_stage=faq_short_circuit.


Turn 7 — Soft acknowledgement

🗣️ Caller says:

"Bedankt mevrouwke, ge bent zeer vriendelijk."

🧠 System should: acknowledge briefly ("Graag gedaan, kan ik u nog ergens mee helpen?") and wait — NOT hang up. Soft thanks are not goodbyes. Voice should remain warm.

What's tested: soft-farewell vs hard-goodbye disambiguation on dialect closer ("ge bent zeer vriendelijk" is appreciation, not goodbye).

🔎 Post-call: turn 7's intent should be acknowledgment; call still active.


Turn 8 — Hard goodbye (dialect)

🗣️ Caller says:

"Nee, dat was alles. Tot ziens hè."

🧠 System should: close warmly and hang up. The "hè" tag-particle is dialect filler and should not confuse the goodbye classifier.

What's tested: explicit goodbye on a dialect closer.

🔎 Post-call: hangup_reason='caller_goodbye'.


Pass criteria

This persona is considered PASSED when:

  1. Turn 1 language-locks to Dutch (NOT English/German) despite dialect tokens.
  2. Turn 4 fires the wheelchair-at-inkomhal answer, NOT the orthopedic-reimbursement chunk.
  3. Turn 5's repeat request is honoured by the REPEAT_PREVIOUS short-circuit (no new RAG).
  4. Turn 6 hits the FAQ short-circuit (sub-second latency).
  5. Turn 7 is classified as acknowledgment, not goodbye.
  6. Turn 8 closes the call cleanly.
  7. The system's answers use plain Dutch — no English jargon ("waiting room", "schedule"), no clinical jargon ("decubitusprofylaxe", "intraveneus toedienen").
  8. No turn fires a safety refusal (this persona has no medical-advice probes).

Run automatically

python -m tests.evaluation.run_voice_evaluation --persona persona_04_mevrouw_maeyens