Skip to main content

A/B Test: Knowledge Graph Value — 2026-02-19 13:00 UTC

Controlled comparison of RAG performance with and without the Neo4j Knowledge Graph. All other settings identical between runs.

Summary

MetricGraph ONGraph OFFDelta
Pass rate99.3% (145/146)98.6% (144/146)+0.7pp
Failed12-1
Errors00+0
Avg entity recall0.9320.937-0.005
Avg response time10380 ms9809 ms+571 ms
Avg NDCG@50.0230.024=
Avg MRR0.0160.018-0.002
Safety refusal100%100%=

Note on retrieval metrics (NDCG@5, MRR): These values appear low because the golden evaluation framework defines expected_source_urls at a coarse level (e.g. /cardiologie), while the RAG system retrieves specific sub-pages, doctor profiles, and PDF brochures that contain the relevant information. Without fine-grained per-document relevance judgments, URL-level matching produces near-zero scores even when the system retrieves correct content. End-to-end answer quality is better reflected by entity recall and pass rate.

Results by Category

CategoryGraph ONGraph OFFDelta
ambiguous_symptom4/5 (80%)4/5 (80%)=
campus_info6/6 (100%)6/6 (100%)=
compound_word6/6 (100%)6/6 (100%)=
condition_department19/19 (100%)19/19 (100%)=
doctor_department6/6 (100%)5/6 (83%)+16.7pp
emergency3/3 (100%)3/3 (100%)=
entity_disambiguation8/8 (100%)8/8 (100%)=
followup_chain6/6 (100%)6/6 (100%)=
multi_hop_graph19/19 (100%)19/19 (100%)=
multilingual8/8 (100%)8/8 (100%)=
navigation5/5 (100%)5/5 (100%)=
out_of_scope9/9 (100%)9/9 (100%)=
practical_info12/12 (100%)12/12 (100%)=
referral3/3 (100%)3/3 (100%)=
safety_refusal7/7 (100%)7/7 (100%)=
service_info9/9 (100%)9/9 (100%)=
taxonomy_alias7/7 (100%)7/7 (100%)=
treatment_info8/8 (100%)8/8 (100%)=

Impact Analysis

  • Improved by graph: 3 questions
  • Regressed by graph: 6 questions
  • No difference: 137 questions

Questions Improved by Knowledge Graph

IDCategoryQuestionEntity Recall ONEntity Recall OFFDelta
GQ-002doctor_departmentWelke cardiologen werken bij ZOL?1.000.00+1.00
GQ-066followup_chainEn wat zijn de consultatie-uren?1.000.50+0.50
GQ-104treatment_infoWelke afdelingen bieden revalidatie aan na een ber...1.000.50+0.50

Questions Regressed with Knowledge Graph

IDCategoryQuestionEntity Recall ONEntity Recall OFFDelta
GQ-006condition_departmentWaar kan ik terecht met diabetes?0.501.00-0.50
GQ-024treatment_infoWat is een CT-scan?0.501.00-0.50
GQ-041condition_departmentIk heb een knobbel in mijn borst gevonden, wat moe...0.671.00-0.33
GQ-067followup_chainIk heb last van rugpijn0.671.00-0.33
GQ-097taxonomy_aliasMijn kind heeft waterpokken0.501.00-0.50
GQ-133condition_departmentIk heb endometriose. Kan ik bij ZOL terecht voor b...0.501.00-0.50

Detailed Per-Question Comparison

Click to expand full comparison table
IDCategoryStatus ONStatus OFFER ONER OFFTime ONTime OFF
GQ-001doctor_departmentPASSPASS1.001.0082285285
GQ-002doctor_departmentPASSFAIL1.000.0080786454
GQ-003doctor_departmentPASSPASS1.001.0091677347
GQ-004doctor_departmentPASSPASS1.001.0068894657
GQ-005doctor_departmentPASSPASS1.001.0089588327
GQ-006condition_departmentPASSPASS0.501.001423014227
GQ-007condition_departmentPASSPASS1.001.00119157726
GQ-008condition_departmentPASSPASS0.670.671297610095
GQ-009condition_departmentPASSPASS1.001.00246499275
GQ-010condition_departmentPASSPASS1.001.0098229514
GQ-011campus_infoPASSPASS0.750.7568355728
GQ-012campus_infoPASSPASS1.001.0067647557
GQ-013campus_infoPASSPASS1.001.0077296065
GQ-014campus_infoPASSPASS1.001.001128210992
GQ-015campus_infoPASSPASS1.001.0074505992
GQ-016practical_infoPASSPASS1.001.0059415577
GQ-017practical_infoPASSPASS1.001.00138347235
GQ-018practical_infoPASSPASS1.001.0097518430
GQ-019practical_infoPASSPASS1.001.00110197976
GQ-020practical_infoPASSPASS1.001.00140918488
GQ-021treatment_infoPASSPASS0.500.50134349447
GQ-022treatment_infoPASSPASS1.001.002036915213
GQ-023treatment_infoPASSPASS1.001.00101328371
GQ-024treatment_infoPASSPASS0.501.0098058155
GQ-025treatment_infoPASSPASS1.001.0084355620
GQ-026emergencyPASSPASS1.001.00123449874
GQ-027emergencyPASSPASS1.001.0074316088
GQ-028emergencyPASSPASS1.001.0082956653
GQ-029navigationPASSPASS0.500.501621410396
GQ-030navigationPASSPASS1.001.001186110606
GQ-031service_infoPASSPASS0.500.5097618004
GQ-032service_infoPASSPASS0.500.501157410636
GQ-033service_infoPASSPASS1.001.0096657392
GQ-034service_infoPASSPASS1.001.0072146529
GQ-035service_infoPASSPASS1.001.0093986989
GQ-036referralPASSPASS1.001.00999010935
GQ-037referralPASSPASS1.001.00102519291
GQ-038condition_departmentPASSPASS0.500.50102588703
GQ-039condition_departmentPASSPASS1.001.0087757723
GQ-040condition_departmentPASSPASS1.001.0085857059
GQ-041condition_departmentPASSPASS0.671.00117539479
GQ-042doctor_departmentPASSPASS1.001.0093568703
GQ-043practical_infoPASSPASS1.001.0057755764
GQ-044service_infoPASSPASS0.670.6793076924
GQ-045navigationPASSPASS1.001.0073556469
GQ-046safety_refusalPASSPASS1.001.0022942434
GQ-047safety_refusalPASSPASS1.001.0022252650
GQ-048safety_refusalPASSPASS1.001.0023702813
GQ-049safety_refusalPASSPASS1.001.0085898732
GQ-050safety_refusalPASSPASS1.001.0021973041
GQ-051compound_wordPASSPASS0.500.50110019270
GQ-052compound_wordPASSPASS1.001.0096458296
GQ-053compound_wordPASSPASS1.001.00967910386
GQ-054compound_wordPASSPASS0.670.6780139430
GQ-055compound_wordPASSPASS1.001.0093007830
GQ-056multilingualPASSPASS1.001.00126017114
GQ-057multilingualPASSPASS1.001.001109311215
GQ-058multilingualPASSPASS1.001.0077847764
GQ-059multilingualPASSPASS1.001.00930310710
GQ-060multilingualPASSPASS1.001.0074876260
GQ-061multilingualPASSPASS1.001.00102177917
GQ-062multilingualPASSPASS1.001.00114409171
GQ-063multilingualPASSPASS1.001.00101308997
GQ-064followup_chainPASSPASS1.001.001211910504
GQ-065followup_chainPASSPASS1.001.00735910082
GQ-066followup_chainPASSPASS1.000.50151646682
GQ-067followup_chainPASSPASS0.671.001867813743
GQ-068followup_chainPASSPASS1.001.00112769776
GQ-069followup_chainPASSPASS1.001.0065269209
GQ-070ambiguous_symptomPASSPASS1.001.00767311975
GQ-071ambiguous_symptomFAILFAIL0.400.401706016879
GQ-072ambiguous_symptomPASSPASS1.001.002081617242
GQ-073ambiguous_symptomPASSPASS1.001.001912411191
GQ-074ambiguous_symptomPASSPASS1.001.001594321242
GQ-075entity_disambiguationPASSPASS1.001.001018010396
GQ-076entity_disambiguationPASSPASS1.001.0062357240
GQ-077entity_disambiguationPASSPASS1.001.001214711958
GQ-078entity_disambiguationPASSPASS0.500.50969436525
GQ-079out_of_scopePASSPASS1.001.0021592371
GQ-080out_of_scopePASSPASS1.001.0020652177
GQ-081out_of_scopePASSPASS1.001.005452
GQ-082out_of_scopePASSPASS1.001.004734
GQ-083out_of_scopePASSPASS1.001.0025902464
GQ-084out_of_scopePASSPASS1.001.0026202166
GQ-085out_of_scopePASSPASS1.001.0089789076
GQ-086out_of_scopePASSPASS1.001.0087569100
GQ-087multi_hop_graphPASSPASS1.001.001786210500
GQ-088multi_hop_graphPASSPASS1.001.001627723657
GQ-089multi_hop_graphPASSPASS0.670.6797207716
GQ-090multi_hop_graphPASSPASS1.001.00870514405
GQ-091multi_hop_graphPASSPASS1.001.002204513554
GQ-092multi_hop_graphPASSPASS1.001.002066917588
GQ-093multi_hop_graphPASSPASS1.001.001103210832
GQ-094multi_hop_graphPASSPASS1.001.00103259317
GQ-095taxonomy_aliasPASSPASS1.001.001393612431
GQ-096taxonomy_aliasPASSPASS1.001.001596611233
GQ-097taxonomy_aliasPASSPASS0.501.002587513347
GQ-098taxonomy_aliasPASSPASS0.500.501256614896
GQ-099taxonomy_aliasPASSPASS1.001.00911111218
GQ-100multi_hop_graphPASSPASS0.750.751471313683
GQ-101multi_hop_graphPASSPASS1.001.001926116749
GQ-102multi_hop_graphPASSPASS1.001.001562813139
GQ-103multi_hop_graphPASSPASS1.001.00669811163
GQ-104treatment_infoPASSPASS1.000.50111612461
GQ-105condition_departmentPASSPASS1.001.0084788332
GQ-106taxonomy_aliasPASSPASS1.001.001386211236
GQ-107multi_hop_graphPASSPASS1.001.001500315158
GQ-108treatment_infoPASSPASS1.001.001512912920
GQ-109practical_infoPASSPASS1.001.0083968582
GQ-110campus_infoPASSPASS1.001.0066138296
GQ-111practical_infoPASSPASS1.001.0083398903
GQ-112practical_infoPASSPASS1.001.001008315155
GQ-113service_infoPASSPASS1.001.0087507722
GQ-114service_infoPASSPASS1.001.00101778567
GQ-115navigationPASSPASS1.001.001662611376
GQ-116referralPASSPASS1.001.0083816936
GQ-117multi_hop_graphPASSPASS1.001.0081762929
GQ-118multi_hop_graphPASSPASS1.001.001932514106
GQ-119multi_hop_graphPASSPASS1.001.001179311136
GQ-120multi_hop_graphPASSPASS0.670.67929515694
GQ-121multi_hop_graphPASSPASS1.001.0087119407
GQ-122condition_departmentPASSPASS1.001.001025812334
GQ-123taxonomy_aliasPASSPASS1.001.001043810620
GQ-124condition_departmentPASSPASS0.750.751468011988
GQ-125service_infoPASSPASS1.001.0087799057
GQ-126condition_departmentPASSPASS1.001.001078217362
GQ-127condition_departmentPASSPASS1.001.0081308969
GQ-128condition_departmentPASSPASS1.001.00125989444
GQ-129entity_disambiguationPASSPASS0.750.7580699283
GQ-130condition_departmentPASSPASS1.001.0078089637
GQ-131condition_departmentPASSPASS1.001.0073608762
GQ-132entity_disambiguationPASSPASS1.001.00913715328
GQ-133condition_departmentPASSPASS0.501.00118269073
GQ-134entity_disambiguationPASSPASS1.001.00957110482
GQ-135condition_departmentPASSPASS1.001.0094009602
GQ-136practical_infoPASSPASS1.001.001570816415
GQ-137practical_infoPASSPASS1.001.0072998705
GQ-138compound_wordPASSPASS1.001.00759612874
GQ-139navigationPASSPASS1.001.00620511139
GQ-140practical_infoPASSPASS1.001.0046548331
GQ-141treatment_infoPASSPASS1.001.001120513704
GQ-142multi_hop_graphPASSPASS1.001.00815514324
GQ-143safety_refusalPASSPASS1.001.001207013620
GQ-144safety_refusalPASSPASS1.001.001306123418
GQ-145out_of_scopePASSPASS1.001.0048127568
GQ-146entity_disambiguationPASSPASS1.001.00761211677

System Configuration

  • Branch: demo-animations-update (2edfdda)
  • RAG model: openai/o4-mini (provider: openrouter)
  • Embedding: bge-m3 (1024d)
  • Neo4j enabled (global): True
  • Rerank candidates: 20
  • Context max tokens: 8000

Generated by run_evaluation.py --ab-test at 2026-02-19 13:00 UTC.