Abstract

Ziekenhuis Oost-Limburg (ZOL), a hospital in Belgian Limburg, receives over 100 000 monthly web visitors generating approximately 25 000 search queries per month. The existing Elasticsearch keyword search fails because patients pose natural-language questions in colloquial Dutch while content is indexed in clinical terminology. This thesis presents the design, implementation, and evaluation of a Retrieval-Augmented Generation (RAG) system with conditional knowledge-graph enrichment as a replacement.

The system employs an eleven-stage query pipeline combining hybrid retrieval (dense vectors + BM25 via Reciprocal Rank Fusion), cross-encoder reranking, conditional PostgreSQL-based taxonomy enrichment (1 564 entities at time of evaluation; 2 663 post-deduplication), SNOMED CT Belgian Edition integration for Dutch medical-synonym resolution, and a five-layer defence-in-depth safety architecture enforcing a zero-medical-advice posture. Eight languages are supported over Dutch-language hospital content; the architecture is Modular RAG in the typology of Gao et al. 2024.

Evaluation uses 302 golden questions spanning 21 categories. A fractional-factorial ablation isolates CRAG, FILCO, and Guardrails contributions, with 10 000-resample bootstrap confidence intervals and McNemar's test. The system achieves 99.0 % pass rate (296/299, 95 % CI [0.977, 1.000]) and entity recall 0.932 [0.916, 0.965], with zero medical-advice incidents across all configurations. Individual features each improve quality (CRAG +2.5 pp; FILCO +2.5 pp; Guardrails +3.7 pp), but combined activation regresses to 96.3 % — a feature-interaction effect that is itself a contribution. FILCO reduces median latency by 29 %.

The work contributes: (1) a production-grade hospital-search RAG system deployed to a pre-production pilot; (2) the conditional-graph-injection finding not previously reported in the GraphRAG/HybridRAG literature; (3) a five-layer safety architecture with sub-five-millisecond statistical detector against gradient-based adversarial-suffix attacks (Zou et al. 2023); (4) a reusable golden-standard evaluation framework; and (5) a three-phase SNOMED CT integration. The system has not been evaluated with real visitors; a real-user study is identified as the highest-priority future work.