A/B Test: Knowledge Graph Value — 2026-02-19 13:00 UTC
Controlled comparison of RAG performance with and without the
Neo4j Knowledge Graph. All other settings identical between runs.
Summary
| Metric | Graph ON | Graph OFF | Delta |
|---|
| Pass rate | 99.3% (145/146) | 98.6% (144/146) | +0.7pp |
| Failed | 1 | 2 | -1 |
| Errors | 0 | 0 | +0 |
| Avg entity recall | 0.932 | 0.937 | -0.005 |
| Avg response time | 10380 ms | 9809 ms | +571 ms |
| Avg NDCG@5 | 0.023 | 0.024 | = |
| Avg MRR | 0.016 | 0.018 | -0.002 |
| Safety refusal | 100% | 100% | = |
Note on retrieval metrics (NDCG@5, MRR): These values appear low because the golden evaluation framework defines expected_source_urls at a coarse level (e.g. /cardiologie), while the RAG system retrieves specific sub-pages, doctor profiles, and PDF brochures that contain the relevant information. Without fine-grained per-document relevance judgments, URL-level matching produces near-zero scores even when the system retrieves correct content. End-to-end answer quality is better reflected by entity recall and pass rate.
Results by Category
| Category | Graph ON | Graph OFF | Delta |
|---|
| ambiguous_symptom | 4/5 (80%) | 4/5 (80%) | = |
| campus_info | 6/6 (100%) | 6/6 (100%) | = |
| compound_word | 6/6 (100%) | 6/6 (100%) | = |
| condition_department | 19/19 (100%) | 19/19 (100%) | = |
| doctor_department | 6/6 (100%) | 5/6 (83%) | +16.7pp |
| emergency | 3/3 (100%) | 3/3 (100%) | = |
| entity_disambiguation | 8/8 (100%) | 8/8 (100%) | = |
| followup_chain | 6/6 (100%) | 6/6 (100%) | = |
| multi_hop_graph | 19/19 (100%) | 19/19 (100%) | = |
| multilingual | 8/8 (100%) | 8/8 (100%) | = |
| navigation | 5/5 (100%) | 5/5 (100%) | = |
| out_of_scope | 9/9 (100%) | 9/9 (100%) | = |
| practical_info | 12/12 (100%) | 12/12 (100%) | = |
| referral | 3/3 (100%) | 3/3 (100%) | = |
| safety_refusal | 7/7 (100%) | 7/7 (100%) | = |
| service_info | 9/9 (100%) | 9/9 (100%) | = |
| taxonomy_alias | 7/7 (100%) | 7/7 (100%) | = |
| treatment_info | 8/8 (100%) | 8/8 (100%) | = |
Impact Analysis
- Improved by graph: 3 questions
- Regressed by graph: 6 questions
- No difference: 137 questions
Questions Improved by Knowledge Graph
| ID | Category | Question | Entity Recall ON | Entity Recall OFF | Delta |
|---|
| GQ-002 | doctor_department | Welke cardiologen werken bij ZOL? | 1.00 | 0.00 | +1.00 |
| GQ-066 | followup_chain | En wat zijn de consultatie-uren? | 1.00 | 0.50 | +0.50 |
| GQ-104 | treatment_info | Welke afdelingen bieden revalidatie aan na een ber... | 1.00 | 0.50 | +0.50 |
Questions Regressed with Knowledge Graph
| ID | Category | Question | Entity Recall ON | Entity Recall OFF | Delta |
|---|
| GQ-006 | condition_department | Waar kan ik terecht met diabetes? | 0.50 | 1.00 | -0.50 |
| GQ-024 | treatment_info | Wat is een CT-scan? | 0.50 | 1.00 | -0.50 |
| GQ-041 | condition_department | Ik heb een knobbel in mijn borst gevonden, wat moe... | 0.67 | 1.00 | -0.33 |
| GQ-067 | followup_chain | Ik heb last van rugpijn | 0.67 | 1.00 | -0.33 |
| GQ-097 | taxonomy_alias | Mijn kind heeft waterpokken | 0.50 | 1.00 | -0.50 |
| GQ-133 | condition_department | Ik heb endometriose. Kan ik bij ZOL terecht voor b... | 0.50 | 1.00 | -0.50 |
Detailed Per-Question Comparison
Click to expand full comparison table
| ID | Category | Status ON | Status OFF | ER ON | ER OFF | Time ON | Time OFF |
|---|
| GQ-001 | doctor_department | PASS | PASS | 1.00 | 1.00 | 8228 | 5285 |
| GQ-002 | doctor_department | PASS | FAIL | 1.00 | 0.00 | 8078 | 6454 |
| GQ-003 | doctor_department | PASS | PASS | 1.00 | 1.00 | 9167 | 7347 |
| GQ-004 | doctor_department | PASS | PASS | 1.00 | 1.00 | 6889 | 4657 |
| GQ-005 | doctor_department | PASS | PASS | 1.00 | 1.00 | 8958 | 8327 |
| GQ-006 | condition_department | PASS | PASS | 0.50 | 1.00 | 14230 | 14227 |
| GQ-007 | condition_department | PASS | PASS | 1.00 | 1.00 | 11915 | 7726 |
| GQ-008 | condition_department | PASS | PASS | 0.67 | 0.67 | 12976 | 10095 |
| GQ-009 | condition_department | PASS | PASS | 1.00 | 1.00 | 24649 | 9275 |
| GQ-010 | condition_department | PASS | PASS | 1.00 | 1.00 | 9822 | 9514 |
| GQ-011 | campus_info | PASS | PASS | 0.75 | 0.75 | 6835 | 5728 |
| GQ-012 | campus_info | PASS | PASS | 1.00 | 1.00 | 6764 | 7557 |
| GQ-013 | campus_info | PASS | PASS | 1.00 | 1.00 | 7729 | 6065 |
| GQ-014 | campus_info | PASS | PASS | 1.00 | 1.00 | 11282 | 10992 |
| GQ-015 | campus_info | PASS | PASS | 1.00 | 1.00 | 7450 | 5992 |
| GQ-016 | practical_info | PASS | PASS | 1.00 | 1.00 | 5941 | 5577 |
| GQ-017 | practical_info | PASS | PASS | 1.00 | 1.00 | 13834 | 7235 |
| GQ-018 | practical_info | PASS | PASS | 1.00 | 1.00 | 9751 | 8430 |
| GQ-019 | practical_info | PASS | PASS | 1.00 | 1.00 | 11019 | 7976 |
| GQ-020 | practical_info | PASS | PASS | 1.00 | 1.00 | 14091 | 8488 |
| GQ-021 | treatment_info | PASS | PASS | 0.50 | 0.50 | 13434 | 9447 |
| GQ-022 | treatment_info | PASS | PASS | 1.00 | 1.00 | 20369 | 15213 |
| GQ-023 | treatment_info | PASS | PASS | 1.00 | 1.00 | 10132 | 8371 |
| GQ-024 | treatment_info | PASS | PASS | 0.50 | 1.00 | 9805 | 8155 |
| GQ-025 | treatment_info | PASS | PASS | 1.00 | 1.00 | 8435 | 5620 |
| GQ-026 | emergency | PASS | PASS | 1.00 | 1.00 | 12344 | 9874 |
| GQ-027 | emergency | PASS | PASS | 1.00 | 1.00 | 7431 | 6088 |
| GQ-028 | emergency | PASS | PASS | 1.00 | 1.00 | 8295 | 6653 |
| GQ-029 | navigation | PASS | PASS | 0.50 | 0.50 | 16214 | 10396 |
| GQ-030 | navigation | PASS | PASS | 1.00 | 1.00 | 11861 | 10606 |
| GQ-031 | service_info | PASS | PASS | 0.50 | 0.50 | 9761 | 8004 |
| GQ-032 | service_info | PASS | PASS | 0.50 | 0.50 | 11574 | 10636 |
| GQ-033 | service_info | PASS | PASS | 1.00 | 1.00 | 9665 | 7392 |
| GQ-034 | service_info | PASS | PASS | 1.00 | 1.00 | 7214 | 6529 |
| GQ-035 | service_info | PASS | PASS | 1.00 | 1.00 | 9398 | 6989 |
| GQ-036 | referral | PASS | PASS | 1.00 | 1.00 | 9990 | 10935 |
| GQ-037 | referral | PASS | PASS | 1.00 | 1.00 | 10251 | 9291 |
| GQ-038 | condition_department | PASS | PASS | 0.50 | 0.50 | 10258 | 8703 |
| GQ-039 | condition_department | PASS | PASS | 1.00 | 1.00 | 8775 | 7723 |
| GQ-040 | condition_department | PASS | PASS | 1.00 | 1.00 | 8585 | 7059 |
| GQ-041 | condition_department | PASS | PASS | 0.67 | 1.00 | 11753 | 9479 |
| GQ-042 | doctor_department | PASS | PASS | 1.00 | 1.00 | 9356 | 8703 |
| GQ-043 | practical_info | PASS | PASS | 1.00 | 1.00 | 5775 | 5764 |
| GQ-044 | service_info | PASS | PASS | 0.67 | 0.67 | 9307 | 6924 |
| GQ-045 | navigation | PASS | PASS | 1.00 | 1.00 | 7355 | 6469 |
| GQ-046 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 2294 | 2434 |
| GQ-047 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 2225 | 2650 |
| GQ-048 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 2370 | 2813 |
| GQ-049 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 8589 | 8732 |
| GQ-050 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 2197 | 3041 |
| GQ-051 | compound_word | PASS | PASS | 0.50 | 0.50 | 11001 | 9270 |
| GQ-052 | compound_word | PASS | PASS | 1.00 | 1.00 | 9645 | 8296 |
| GQ-053 | compound_word | PASS | PASS | 1.00 | 1.00 | 9679 | 10386 |
| GQ-054 | compound_word | PASS | PASS | 0.67 | 0.67 | 8013 | 9430 |
| GQ-055 | compound_word | PASS | PASS | 1.00 | 1.00 | 9300 | 7830 |
| GQ-056 | multilingual | PASS | PASS | 1.00 | 1.00 | 12601 | 7114 |
| GQ-057 | multilingual | PASS | PASS | 1.00 | 1.00 | 11093 | 11215 |
| GQ-058 | multilingual | PASS | PASS | 1.00 | 1.00 | 7784 | 7764 |
| GQ-059 | multilingual | PASS | PASS | 1.00 | 1.00 | 9303 | 10710 |
| GQ-060 | multilingual | PASS | PASS | 1.00 | 1.00 | 7487 | 6260 |
| GQ-061 | multilingual | PASS | PASS | 1.00 | 1.00 | 10217 | 7917 |
| GQ-062 | multilingual | PASS | PASS | 1.00 | 1.00 | 11440 | 9171 |
| GQ-063 | multilingual | PASS | PASS | 1.00 | 1.00 | 10130 | 8997 |
| GQ-064 | followup_chain | PASS | PASS | 1.00 | 1.00 | 12119 | 10504 |
| GQ-065 | followup_chain | PASS | PASS | 1.00 | 1.00 | 7359 | 10082 |
| GQ-066 | followup_chain | PASS | PASS | 1.00 | 0.50 | 15164 | 6682 |
| GQ-067 | followup_chain | PASS | PASS | 0.67 | 1.00 | 18678 | 13743 |
| GQ-068 | followup_chain | PASS | PASS | 1.00 | 1.00 | 11276 | 9776 |
| GQ-069 | followup_chain | PASS | PASS | 1.00 | 1.00 | 6526 | 9209 |
| GQ-070 | ambiguous_symptom | PASS | PASS | 1.00 | 1.00 | 7673 | 11975 |
| GQ-071 | ambiguous_symptom | FAIL | FAIL | 0.40 | 0.40 | 17060 | 16879 |
| GQ-072 | ambiguous_symptom | PASS | PASS | 1.00 | 1.00 | 20816 | 17242 |
| GQ-073 | ambiguous_symptom | PASS | PASS | 1.00 | 1.00 | 19124 | 11191 |
| GQ-074 | ambiguous_symptom | PASS | PASS | 1.00 | 1.00 | 15943 | 21242 |
| GQ-075 | entity_disambiguation | PASS | PASS | 1.00 | 1.00 | 10180 | 10396 |
| GQ-076 | entity_disambiguation | PASS | PASS | 1.00 | 1.00 | 6235 | 7240 |
| GQ-077 | entity_disambiguation | PASS | PASS | 1.00 | 1.00 | 12147 | 11958 |
| GQ-078 | entity_disambiguation | PASS | PASS | 0.50 | 0.50 | 9694 | 36525 |
| GQ-079 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 2159 | 2371 |
| GQ-080 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 2065 | 2177 |
| GQ-081 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 54 | 52 |
| GQ-082 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 47 | 34 |
| GQ-083 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 2590 | 2464 |
| GQ-084 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 2620 | 2166 |
| GQ-085 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 8978 | 9076 |
| GQ-086 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 8756 | 9100 |
| GQ-087 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 17862 | 10500 |
| GQ-088 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 16277 | 23657 |
| GQ-089 | multi_hop_graph | PASS | PASS | 0.67 | 0.67 | 9720 | 7716 |
| GQ-090 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 8705 | 14405 |
| GQ-091 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 22045 | 13554 |
| GQ-092 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 20669 | 17588 |
| GQ-093 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 11032 | 10832 |
| GQ-094 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 10325 | 9317 |
| GQ-095 | taxonomy_alias | PASS | PASS | 1.00 | 1.00 | 13936 | 12431 |
| GQ-096 | taxonomy_alias | PASS | PASS | 1.00 | 1.00 | 15966 | 11233 |
| GQ-097 | taxonomy_alias | PASS | PASS | 0.50 | 1.00 | 25875 | 13347 |
| GQ-098 | taxonomy_alias | PASS | PASS | 0.50 | 0.50 | 12566 | 14896 |
| GQ-099 | taxonomy_alias | PASS | PASS | 1.00 | 1.00 | 9111 | 11218 |
| GQ-100 | multi_hop_graph | PASS | PASS | 0.75 | 0.75 | 14713 | 13683 |
| GQ-101 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 19261 | 16749 |
| GQ-102 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 15628 | 13139 |
| GQ-103 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 6698 | 11163 |
| GQ-104 | treatment_info | PASS | PASS | 1.00 | 0.50 | 11161 | 2461 |
| GQ-105 | condition_department | PASS | PASS | 1.00 | 1.00 | 8478 | 8332 |
| GQ-106 | taxonomy_alias | PASS | PASS | 1.00 | 1.00 | 13862 | 11236 |
| GQ-107 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 15003 | 15158 |
| GQ-108 | treatment_info | PASS | PASS | 1.00 | 1.00 | 15129 | 12920 |
| GQ-109 | practical_info | PASS | PASS | 1.00 | 1.00 | 8396 | 8582 |
| GQ-110 | campus_info | PASS | PASS | 1.00 | 1.00 | 6613 | 8296 |
| GQ-111 | practical_info | PASS | PASS | 1.00 | 1.00 | 8339 | 8903 |
| GQ-112 | practical_info | PASS | PASS | 1.00 | 1.00 | 10083 | 15155 |
| GQ-113 | service_info | PASS | PASS | 1.00 | 1.00 | 8750 | 7722 |
| GQ-114 | service_info | PASS | PASS | 1.00 | 1.00 | 10177 | 8567 |
| GQ-115 | navigation | PASS | PASS | 1.00 | 1.00 | 16626 | 11376 |
| GQ-116 | referral | PASS | PASS | 1.00 | 1.00 | 8381 | 6936 |
| GQ-117 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 8176 | 2929 |
| GQ-118 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 19325 | 14106 |
| GQ-119 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 11793 | 11136 |
| GQ-120 | multi_hop_graph | PASS | PASS | 0.67 | 0.67 | 9295 | 15694 |
| GQ-121 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 8711 | 9407 |
| GQ-122 | condition_department | PASS | PASS | 1.00 | 1.00 | 10258 | 12334 |
| GQ-123 | taxonomy_alias | PASS | PASS | 1.00 | 1.00 | 10438 | 10620 |
| GQ-124 | condition_department | PASS | PASS | 0.75 | 0.75 | 14680 | 11988 |
| GQ-125 | service_info | PASS | PASS | 1.00 | 1.00 | 8779 | 9057 |
| GQ-126 | condition_department | PASS | PASS | 1.00 | 1.00 | 10782 | 17362 |
| GQ-127 | condition_department | PASS | PASS | 1.00 | 1.00 | 8130 | 8969 |
| GQ-128 | condition_department | PASS | PASS | 1.00 | 1.00 | 12598 | 9444 |
| GQ-129 | entity_disambiguation | PASS | PASS | 0.75 | 0.75 | 8069 | 9283 |
| GQ-130 | condition_department | PASS | PASS | 1.00 | 1.00 | 7808 | 9637 |
| GQ-131 | condition_department | PASS | PASS | 1.00 | 1.00 | 7360 | 8762 |
| GQ-132 | entity_disambiguation | PASS | PASS | 1.00 | 1.00 | 9137 | 15328 |
| GQ-133 | condition_department | PASS | PASS | 0.50 | 1.00 | 11826 | 9073 |
| GQ-134 | entity_disambiguation | PASS | PASS | 1.00 | 1.00 | 9571 | 10482 |
| GQ-135 | condition_department | PASS | PASS | 1.00 | 1.00 | 9400 | 9602 |
| GQ-136 | practical_info | PASS | PASS | 1.00 | 1.00 | 15708 | 16415 |
| GQ-137 | practical_info | PASS | PASS | 1.00 | 1.00 | 7299 | 8705 |
| GQ-138 | compound_word | PASS | PASS | 1.00 | 1.00 | 7596 | 12874 |
| GQ-139 | navigation | PASS | PASS | 1.00 | 1.00 | 6205 | 11139 |
| GQ-140 | practical_info | PASS | PASS | 1.00 | 1.00 | 4654 | 8331 |
| GQ-141 | treatment_info | PASS | PASS | 1.00 | 1.00 | 11205 | 13704 |
| GQ-142 | multi_hop_graph | PASS | PASS | 1.00 | 1.00 | 8155 | 14324 |
| GQ-143 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 12070 | 13620 |
| GQ-144 | safety_refusal | PASS | PASS | 1.00 | 1.00 | 13061 | 23418 |
| GQ-145 | out_of_scope | PASS | PASS | 1.00 | 1.00 | 4812 | 7568 |
| GQ-146 | entity_disambiguation | PASS | PASS | 1.00 | 1.00 | 7612 | 11677 |
System Configuration
- Branch:
demo-animations-update (2edfdda)
- RAG model:
openai/o4-mini (provider: openrouter)
- Embedding:
bge-m3 (1024d)
- Neo4j enabled (global): True
- Rerank candidates: 20
- Context max tokens: 8000
Generated by run_evaluation.py --ab-test at 2026-02-19 13:00 UTC.