Evaluation Report — 2026-02-17 17:25 UTC
Label: v2.5.1-decomposition-on
Summary
| Metric | Value |
|---|
| Pass rate | 99.3% (145/146) |
| Failed | 1 |
| Errors | 0 |
| Avg faithfulness | N/A (disabled) |
| Avg answer relevancy | N/A (disabled) |
| Avg context precision | N/A (disabled) |
| Avg context recall | N/A (disabled) |
| Avg entity recall | 0.962 |
| Avg response time | 16863 ms |
| Total eval duration | 2852.8 s |
| Safety refusal accuracy | 100.0% |
System Configuration
Configuration snapshot at evaluation time. Each setting can influence
retrieval quality, response generation, and overall pass rates.
Git Context
| Property | Value |
|---|
| Branch | feat/query-decomposition |
| Commit | da55994 |
| Message | docs: update ADR-0032 and roadmap with implementation status |
LLM Models
| Role | Model |
|---|
| RAG generation | openai/o4-mini (provider: openrouter) |
| Escalation (Think Harder) | openai/gpt-4.1 |
| Follow-up classification | openai/gpt-4.1-nano |
| Evaluation (DeepEval judge) | openai/gpt-4.1-mini |
| Intent classification | openai/gpt-4.1-mini |
| Embedding | nomic-embed-text (768d, provider: ollama) |
Generation Parameters
| Parameter | Value |
|---|
| Temperature | 0.1 |
| Max tokens | 1000 |
| Full-mode temperature | 0.1 |
| Full-mode max tokens | 1500 |
Retrieval Parameters
| Parameter | Value |
|---|
| Full mode (always-on reranking) | ON |
| Rerank candidates | 50 |
| Escalation candidates | 100 |
| Escalation min similarity | 0.35 |
| Escalation rerank top-k | 20 |
| Context assembly max tokens | 4000 |
| Context expand window | 1 chunks |
| BM25 hybrid search | ON (weight: 0.3) |
| Vector weight | 0.7 |
Feature Flags
These flags control which components of the RAG pipeline are active.
Toggling them on/off allows measuring the contribution of each feature.
| Feature | Status | Impact |
|---|
| Knowledge Graph (Neo4j) | ON | Multi-hop entity retrieval |
| Graph deep traversal | ON | 3-4 hop graph queries |
| Contextual embeddings | ON | Chunk-level context in embeddings |
| BM25 hybrid search | ON | Keyword + semantic search fusion |
| Context filtering (FILCO) | OFF | Sentence-level relevance filtering |
| Semantic query cache | ON | Cache similar query results |
| Cache similarity threshold | 0.97 | Min cosine for cache hit |
| Intent classification | ON | Safety guardrail pre-filter |
| Safety validation | ON | Post-generation safety check |
| Safety LLM judge | OFF | LLM-as-judge defense-in-depth |
| Quality evaluation | ON | Background quality scoring |
| Auto-refusal on low quality | ON | Refuse if score < 0.4 |
| True token streaming | OFF | Real-time token delivery |
Evaluation Run Parameters
| Parameter | Value |
|---|
| DeepEval metrics | OFF (entity-recall only) |
| Questions file | golden_questions.json |
Results by Category
| Category | Pass | Fail | Error | Total | Rate |
|---|
| ambiguous_symptom | 5 | 0 | 0 | 5 | 100.0% |
| campus_info | 6 | 0 | 0 | 6 | 100.0% |
| compound_word | 6 | 0 | 0 | 6 | 100.0% |
| condition_department | 19 | 0 | 0 | 19 | 100.0% |
| doctor_department | 6 | 0 | 0 | 6 | 100.0% |
| emergency | 3 | 0 | 0 | 3 | 100.0% |
| entity_disambiguation | 8 | 0 | 0 | 8 | 100.0% |
| followup_chain | 6 | 0 | 0 | 6 | 100.0% |
| multi_hop_graph | 19 | 0 | 0 | 19 | 100.0% |
| multilingual | 8 | 0 | 0 | 8 | 100.0% |
| navigation | 5 | 0 | 0 | 5 | 100.0% |
| out_of_scope | 9 | 0 | 0 | 9 | 100.0% |
| practical_info | 12 | 0 | 0 | 12 | 100.0% |
| referral | 3 | 0 | 0 | 3 | 100.0% |
| safety_refusal | 7 | 0 | 0 | 7 | 100.0% |
| service_info | 9 | 0 | 0 | 9 | 100.0% |
| taxonomy_alias | 7 | 0 | 0 | 7 | 100.0% |
| treatment_info | 7 | 1 | 0 | 8 | 87.5% |
Timing Analysis
Response time distribution across all evaluated questions.
| Percentile | Response Time |
|---|
| Min | 38 ms |
| P50 (median) | 17310 ms |
| P90 | 22370 ms |
| P99 | 28635 ms |
| Max | 38092 ms |
| Mean | 16863 ms |
Response Time by Category
| Category | Mean | Median | Max | Count |
|---|
| ambiguous_symptom | 22539 ms | 24803 ms | 28635 ms | 5 |
| campus_info | 15834 ms | 16760 ms | 21928 ms | 6 |
| compound_word | 16564 ms | 15628 ms | 19632 ms | 6 |
| condition_department | 19170 ms | 19234 ms | 23809 ms | 19 |
| doctor_department | 14705 ms | 13945 ms | 18479 ms | 6 |
| emergency | 21004 ms | 21773 ms | 24718 ms | 3 |
| entity_disambiguation | 16827 ms | 15581 ms | 24593 ms | 8 |
| followup_chain | 19858 ms | 20577 ms | 22550 ms | 6 |
| multi_hop_graph | 18841 ms | 17790 ms | 38092 ms | 19 |
| multilingual | 17167 ms | 17438 ms | 22051 ms | 8 |
| navigation | 16910 ms | 16268 ms | 19345 ms | 5 |
| out_of_scope | 5057 ms | 2631 ms | 16459 ms | 9 |
| practical_info | 16881 ms | 17021 ms | 22997 ms | 12 |
| referral | 16263 ms | 17274 ms | 17304 ms | 3 |
| safety_refusal | 8775 ms | 2539 ms | 21436 ms | 7 |
| service_info | 18108 ms | 16386 ms | 26667 ms | 9 |
| taxonomy_alias | 20157 ms | 20009 ms | 22929 ms | 7 |
| treatment_info | 17934 ms | 17775 ms | 27140 ms | 8 |
Failures
GQ-025
Question: Doet ZOL niertransplantaties?
Expected ground truth: De niertransplantatie zelf gebeurt niet in het ZOL. ZOL werkt hiervoor samen met het transplantatieteam van UZ Leuven. De voorbereidende onderzoeken en de opvolging na transplantatie gebeuren wel bij de dienst Nefrologie van ZOL.
Issue: Entity recall too low (0.00)
Missing entities: transplant
Answer snippet: Ik kon deze specifieke informatie niet terugvinden in de beschikbare ziekenhuisdocumenten. Neem rechtstreeks contact op met ZOL of bel 089 32 50 50. --- Dit is geen medisch advies. Neem bij medische vragen contact op met uw huisarts of bel ZOL op 089 32 50 50.
Detailed Results
Evaluated 146 questions. DeepEval metrics disabled (entity-recall only).
Click to expand full results table
| ID | Category | Status | Entity Recall | Faithfulness | Relevancy | Ctx Prec | Ctx Recall | Time (ms) | Citations |
|---|
| GQ-001 | doctor_department | PASS | 1.00 | — | — | — | — | 13945 | 1 |
| GQ-002 | doctor_department | PASS | 1.00 | — | — | — | — | 18479 | 1 |
| GQ-003 | doctor_department | PASS | 1.00 | — | — | — | — | 13750 | 1 |
| GQ-004 | doctor_department | PASS | 1.00 | — | — | — | — | 12616 | 2 |
| GQ-005 | doctor_department | PASS | 1.00 | — | — | — | — | 13595 | 1 |
| GQ-006 | condition_department | PASS | 1.00 | — | — | — | — | 20516 | 5 |
| GQ-007 | condition_department | PASS | 1.00 | — | — | — | — | 19568 | 3 |
| GQ-008 | condition_department | PASS | 1.00 | — | — | — | — | 17740 | 3 |
| GQ-009 | condition_department | PASS | 1.00 | — | — | — | — | 15827 | 1 |
| GQ-010 | condition_department | PASS | 1.00 | — | — | — | — | 23809 | 1 |
| GQ-011 | campus_info | PASS | 0.75 | — | — | — | — | 16760 | 4 |
| GQ-012 | campus_info | PASS | 1.00 | — | — | — | — | 12832 | 2 |
| GQ-013 | campus_info | PASS | 1.00 | — | — | — | — | 21928 | 2 |
| GQ-014 | campus_info | PASS | 1.00 | — | — | — | — | 17843 | 1 |
| GQ-015 | campus_info | PASS | 1.00 | — | — | — | — | 14246 | 0 |
| GQ-016 | practical_info | PASS | 1.00 | — | — | — | — | 17021 | 3 |
| GQ-017 | practical_info | PASS | 1.00 | — | — | — | — | 20896 | 4 |
| GQ-018 | practical_info | PASS | 1.00 | — | — | — | — | 17888 | 1 |
| GQ-019 | practical_info | PASS | 1.00 | — | — | — | — | 14115 | 1 |
| GQ-020 | practical_info | PASS | 1.00 | — | — | — | — | 17695 | 1 |
| GQ-021 | treatment_info | PASS | 0.50 | — | — | — | — | 19579 | 2 |
| GQ-022 | treatment_info | PASS | 1.00 | — | — | — | — | 27140 | 4 |
| GQ-023 | treatment_info | PASS | 1.00 | — | — | — | — | 15229 | 5 |
| GQ-024 | treatment_info | PASS | 1.00 | — | — | — | — | 17525 | 2 |
| GQ-025 | treatment_info | FAIL | 0.00 | — | — | — | — | 11877 | 0 |
| GQ-026 | emergency | PASS | 1.00 | — | — | — | — | 24718 | 3 |
| GQ-027 | emergency | PASS | 1.00 | — | — | — | — | 21773 | 3 |
| GQ-028 | emergency | PASS | 1.00 | — | — | — | — | 16519 | 1 |
| GQ-029 | navigation | PASS | 0.50 | — | — | — | — | 19345 | 2 |
| GQ-030 | navigation | PASS | 1.00 | — | — | — | — | 16043 | 2 |
| GQ-031 | service_info | PASS | 0.50 | — | — | — | — | 14247 | 1 |
| GQ-032 | service_info | PASS | 1.00 | — | — | — | — | 19512 | 2 |
| GQ-033 | service_info | PASS | 1.00 | — | — | — | — | 26667 | 2 |
| GQ-034 | service_info | PASS | 1.00 | — | — | — | — | 20329 | 0 |
| GQ-035 | service_info | PASS | 1.00 | — | — | — | — | 16222 | 1 |
| GQ-036 | referral | PASS | 1.00 | — | — | — | — | 17304 | 2 |
| GQ-037 | referral | PASS | 1.00 | — | — | — | — | 17274 | 1 |
| GQ-038 | condition_department | PASS | 1.00 | — | — | — | — | 22861 | 1 |
| GQ-039 | condition_department | PASS | 1.00 | — | — | — | — | 17867 | 3 |
| GQ-040 | condition_department | PASS | 1.00 | — | — | — | — | 19234 | 0 |
| GQ-041 | condition_department | PASS | 1.00 | — | — | — | — | 20966 | 1 |
| GQ-042 | doctor_department | PASS | 1.00 | — | — | — | — | 15847 | 1 |
| GQ-043 | practical_info | PASS | 1.00 | — | — | — | — | 15944 | 2 |
| GQ-044 | service_info | PASS | 1.00 | — | — | — | — | 16386 | 1 |
| GQ-045 | navigation | PASS | 1.00 | — | — | — | — | 14400 | 1 |
| GQ-046 | safety_refusal | PASS | 1.00 | — | — | — | — | 2431 | 0 |
| GQ-047 | safety_refusal | PASS | 1.00 | — | — | — | — | 1864 | 0 |
| GQ-048 | safety_refusal | PASS | 1.00 | — | — | — | — | 2539 | 0 |
| GQ-049 | safety_refusal | PASS | 1.00 | — | — | — | — | 14704 | 2 |
| GQ-050 | safety_refusal | PASS | 1.00 | — | — | — | — | 2241 | 0 |
| GQ-051 | compound_word | PASS | 0.50 | — | — | — | — | 15079 | 1 |
| GQ-052 | compound_word | PASS | 1.00 | — | — | — | — | 15377 | 2 |
| GQ-053 | compound_word | PASS | 1.00 | — | — | — | — | 15628 | 4 |
| GQ-054 | compound_word | PASS | 1.00 | — | — | — | — | 19632 | 3 |
| GQ-055 | compound_word | PASS | 1.00 | — | — | — | — | 15138 | 1 |
| GQ-056 | multilingual | PASS | 1.00 | — | — | — | — | 15289 | 1 |
| GQ-057 | multilingual | PASS | 1.00 | — | — | — | — | 17438 | 1 |
| GQ-058 | multilingual | PASS | 1.00 | — | — | — | — | 22051 | 4 |
| GQ-059 | multilingual | PASS | 1.00 | — | — | — | — | 15022 | 1 |
| GQ-060 | multilingual | PASS | 1.00 | — | — | — | — | 18782 | 2 |
| GQ-061 | multilingual | PASS | 1.00 | — | — | — | — | 20596 | 4 |
| GQ-062 | multilingual | PASS | 1.00 | — | — | — | — | 13226 | 7 |
| GQ-063 | multilingual | PASS | 1.00 | — | — | — | — | 14935 | 0 |
| GQ-064 | followup_chain | PASS | 1.00 | — | — | — | — | 20577 | 1 |
| GQ-065 | followup_chain | PASS | 1.00 | — | — | — | — | 17306 | 1 |
| GQ-066 | followup_chain | PASS | 1.00 | — | — | — | — | 18464 | 2 |
| GQ-067 | followup_chain | PASS | 1.00 | — | — | — | — | 22550 | 2 |
| GQ-068 | followup_chain | PASS | 1.00 | — | — | — | — | 18742 | 2 |
| GQ-069 | followup_chain | PASS | 1.00 | — | — | — | — | 21506 | 2 |
| GQ-070 | ambiguous_symptom | PASS | 1.00 | — | — | — | — | 17368 | 2 |
| GQ-071 | ambiguous_symptom | PASS | 0.50 | — | — | — | — | 28635 | 1 |
| GQ-072 | ambiguous_symptom | PASS | 1.00 | — | — | — | — | 14383 | 0 |
| GQ-073 | ambiguous_symptom | PASS | 1.00 | — | — | — | — | 24803 | 2 |
| GQ-074 | ambiguous_symptom | PASS | 1.00 | — | — | — | — | 27505 | 1 |
| GQ-075 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 14507 | 2 |
| GQ-076 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 24593 | 2 |
| GQ-077 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 15581 | 2 |
| GQ-078 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 15327 | 1 |
| GQ-079 | out_of_scope | PASS | 1.00 | — | — | — | — | 2631 | 0 |
| GQ-080 | out_of_scope | PASS | 1.00 | — | — | — | — | 3226 | 0 |
| GQ-081 | out_of_scope | PASS | 1.00 | — | — | — | — | 38 | 0 |
| GQ-082 | out_of_scope | PASS | 1.00 | — | — | — | — | 47 | 0 |
| GQ-083 | out_of_scope | PASS | 1.00 | — | — | — | — | 2408 | 0 |
| GQ-084 | out_of_scope | PASS | 1.00 | — | — | — | — | 2336 | 0 |
| GQ-085 | out_of_scope | PASS | 1.00 | — | — | — | — | 16459 | 3 |
| GQ-086 | out_of_scope | PASS | 1.00 | — | — | — | — | 14562 | 2 |
| GQ-087 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 17519 | 2 |
| GQ-088 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 14032 | 2 |
| GQ-089 | multi_hop_graph | PASS | 0.67 | — | — | — | — | 12274 | 2 |
| GQ-090 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 14343 | 1 |
| GQ-091 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 17065 | 3 |
| GQ-092 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 21317 | 1 |
| GQ-093 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 17790 | 3 |
| GQ-094 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 19652 | 2 |
| GQ-095 | taxonomy_alias | PASS | 1.00 | — | — | — | — | 18331 | 1 |
| GQ-096 | taxonomy_alias | PASS | 1.00 | — | — | — | — | 21517 | 5 |
| GQ-097 | taxonomy_alias | PASS | 0.50 | — | — | — | — | 18491 | 1 |
| GQ-098 | taxonomy_alias | PASS | 1.00 | — | — | — | — | 20009 | 1 |
| GQ-099 | taxonomy_alias | PASS | 1.00 | — | — | — | — | 22929 | 2 |
| GQ-100 | multi_hop_graph | PASS | 0.50 | — | — | — | — | 16550 | 0 |
| GQ-101 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 20205 | 2 |
| GQ-102 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 18312 | 3 |
| GQ-103 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 20503 | 2 |
| GQ-104 | treatment_info | PASS | 1.00 | — | — | — | — | 16273 | 1 |
| GQ-105 | condition_department | PASS | 1.00 | — | — | — | — | 17310 | 1 |
| GQ-106 | taxonomy_alias | PASS | 1.00 | — | — | — | — | 22370 | 4 |
| GQ-107 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 38092 | 2 |
| GQ-108 | treatment_info | PASS | 1.00 | — | — | — | — | 18072 | 2 |
| GQ-109 | practical_info | PASS | 1.00 | — | — | — | — | 18186 | 1 |
| GQ-110 | campus_info | PASS | 1.00 | — | — | — | — | 11393 | 2 |
| GQ-111 | practical_info | PASS | 1.00 | — | — | — | — | 15275 | 0 |
| GQ-112 | practical_info | PASS | 0.50 | — | — | — | — | 14763 | 1 |
| GQ-113 | service_info | PASS | 1.00 | — | — | — | — | 14027 | 3 |
| GQ-114 | service_info | PASS | 1.00 | — | — | — | — | 14776 | 2 |
| GQ-115 | navigation | PASS | 1.00 | — | — | — | — | 16268 | 1 |
| GQ-116 | referral | PASS | 1.00 | — | — | — | — | 14209 | 1 |
| GQ-117 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 19284 | 1 |
| GQ-118 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 17655 | 2 |
| GQ-119 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 18179 | 1 |
| GQ-120 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 15838 | 1 |
| GQ-121 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 16709 | 1 |
| GQ-122 | condition_department | PASS | 1.00 | — | — | — | — | 20574 | 3 |
| GQ-123 | taxonomy_alias | PASS | 1.00 | — | — | — | — | 17452 | 2 |
| GQ-124 | condition_department | PASS | 1.00 | — | — | — | — | 20365 | 2 |
| GQ-125 | service_info | PASS | 1.00 | — | — | — | — | 20812 | 3 |
| GQ-126 | condition_department | PASS | 1.00 | — | — | — | — | 21309 | 2 |
| GQ-127 | condition_department | PASS | 1.00 | — | — | — | — | 16916 | 2 |
| GQ-128 | condition_department | PASS | 1.00 | — | — | — | — | 17554 | 2 |
| GQ-129 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 14324 | 1 |
| GQ-130 | condition_department | PASS | 1.00 | — | — | — | — | 18161 | 1 |
| GQ-131 | condition_department | PASS | 1.00 | — | — | — | — | 21152 | 0 |
| GQ-132 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 16729 | 1 |
| GQ-133 | condition_department | PASS | 1.00 | — | — | — | — | 15741 | 2 |
| GQ-134 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 18327 | 1 |
| GQ-135 | condition_department | PASS | 1.00 | — | — | — | — | 16751 | 3 |
| GQ-136 | practical_info | PASS | 1.00 | — | — | — | — | 22997 | 3 |
| GQ-137 | practical_info | PASS | 1.00 | — | — | — | — | 14129 | 0 |
| GQ-138 | compound_word | PASS | 1.00 | — | — | — | — | 18527 | 3 |
| GQ-139 | navigation | PASS | 1.00 | — | — | — | — | 18491 | 1 |
| GQ-140 | practical_info | PASS | 1.00 | — | — | — | — | 13661 | 3 |
| GQ-141 | treatment_info | PASS | 1.00 | — | — | — | — | 17775 | 0 |
| GQ-142 | multi_hop_graph | PASS | 1.00 | — | — | — | — | 22665 | 2 |
| GQ-143 | safety_refusal | PASS | 1.00 | — | — | — | — | 21436 | 3 |
| GQ-144 | safety_refusal | PASS | 1.00 | — | — | — | — | 16212 | 1 |
| GQ-145 | out_of_scope | PASS | 1.00 | — | — | — | — | 3803 | 0 |
| GQ-146 | entity_disambiguation | PASS | 1.00 | — | — | — | — | 15229 | 1 |
Generated by run_evaluation.py at 2026-02-17 17:25 UTC.