References

Canonical bibliography

This chapter provides a thesis-scoped, APA-formatted reference list for all works cited in Chapters 1–6 and 8, following thesis-convention. The canonical, machine-checkable bibliography for the project as a whole is /docs/references (rendered from docs/references.bib). Inline citations elsewhere in the documentation deep-link to that page (e.g., [Karpukhin et al. 2020](/docs/references#karpukhin2020dpr)); when adding new citations across the project, edit references.bib and link to /docs/references#bibkey rather than expanding this chapter. Entries below correspond, where present, to bibkeys on the canonical page; thesis-only entries (e.g., older Dutch clinical-NLP references that the documentation as a whole does not cite) are retained here for the thesis-defence reading.

References are formatted according to APA 7th edition guidelines and organised alphabetically by first author surname.

Afzal, Z., Pons, E., Kang, N., Sturkenboom, M. C., Schuemie, M. J., & Kors, J. A. (2014). ContextD: An algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinformatics, 15, Article 373. https://pubmed.ncbi.nlm.nih.gov/22874189/

Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20, Article 310.

Anthropic. (2024). Introducing contextual retrieval. Anthropic Research Blog. https://www.anthropic.com/news/contextual-retrieval

Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). https://arxiv.org/abs/2310.11511

Bodenreider, O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267--D270. https://doi.org/10.1093/nar/gkh061

Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., van den Driessche, G., Lespiau, J.-B., Damoc, B., Clark, A., de Las Casas, D., Guy, A., Menick, J., Ring, R., Hennigan, T., Huang, S., Maggiore, L., Jones, C., Cassirer, A., ... Sifre, L. (2022). Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning (ICML 2022). https://arxiv.org/abs/2112.04426

Chandak, P., Huang, K., & Zitnik, M. (2023). Building a knowledge graph to enable precision medicine. Scientific Data, 10, Article 67. https://doi.org/10.1038/s41597-023-01960-3

Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). BGE M3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint, arXiv:2402.03216. https://arxiv.org/abs/2402.03216 — canonical entry: /docs/references#chen2024bgem3.

Confident AI. (2024). DeepEval: The open-source LLM evaluation framework. https://deepeval.com/docs/metrics-ragas

Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009). https://doi.org/10.1145/1571941.1572114 — canonical entry: /docs/references#cormack2009rrf.

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From local to global: A graph RAG approach to query-focused summarization. Microsoft Research. https://arxiv.org/abs/2404.16130

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall/CRC.

Fowler, M. (2007). Mocks aren't stubs. martinfowler.com. https://martinfowler.com/articles/mocksArentStubs.html

Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). RAGAS: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (EACL 2024). https://arxiv.org/abs/2309.15217

European Parliament. (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj — canonical entry: /docs/references#ai_act_regulation.

European Parliament. (2017). Regulation (EU) 2017/745 of the European Parliament and of the Council on medical devices (Medical Device Regulation). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2017/745/oj — canonical entry: /docs/references#mdr_regulation.

European Parliament. (2016). Regulation (EU) 2016/679 (General Data Protection Regulation). Official Journal of the European Union, L 119, 1. https://eur-lex.europa.eu/eli/reg/2016/679/oj — canonical entry: /docs/references#gdpr_regulation.

Gao, Y., Xiong, Y., Dibia, V., Cohan, A., & Sil, A. (2024). Retrieval-augmented generation for large language models: A survey. arXiv preprint, arXiv:2312.10997. https://arxiv.org/abs/2312.10997 — canonical entry: /docs/references#gao2024ragsurvey.

Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M.-W. (2020). REALM: Retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020). https://arxiv.org/abs/2002.10083

Hogan, A., Blomqvist, E., Cochez, M., d'Amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., Neumaier, S., Ngomo, A. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys, 54(4), Article 71.

Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y., Tontchev, M., Hu, Q., Fuller, B., Testuggine, D., & Khabsa, M. (2023). Llama Guard: LLM-based input-output safeguard for human-AI conversations. arXiv preprint, arXiv:2312.06674. https://arxiv.org/abs/2312.06674

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), Article 248. https://doi.org/10.1145/3571730

Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). https://arxiv.org/abs/2004.04906 — canonical entry: /docs/references#karpukhin2020dpr.

Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). https://arxiv.org/abs/2004.12832 — canonical entry: /docs/references#khattab2020colbert.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (NeurIPS 2020). https://arxiv.org/abs/2005.11401 — canonical entry: /docs/references#lewis2020rag.

Liao, Z., & Sun, H. (2024). AmpleGCG: Learning a universal and transferable generative model of adversarial suffixes for jailbreaking both open and closed LLMs. arXiv preprint, arXiv:2404.07921. https://arxiv.org/abs/2404.07921 — canonical entry: /docs/references#liao2024amplegcg.

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://arxiv.org/abs/2307.03172 — canonical entry: /docs/references#liu2024lostinmiddle.

Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor using Hierarchical Navigable Small World graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824--836.

Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to information retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/

Martin, R. C. (2017). Clean architecture: A craftsman's guide to software structure and design. Prentice Hall.

Nielsen, J. (1993). Usability engineering (response-time chapter excerpted as "Response Times: The 3 Important Limits", Nielsen Norman Group). Morgan Kaufmann. https://www.nngroup.com/articles/response-times-3-important-limits/ — canonical entry: /docs/references#nielsen1993responsetimes.

Nogueira, R., & Cho, K. (2019). Passage re-ranking with BERT. arXiv preprint, arXiv:1901.04085. https://arxiv.org/abs/1901.04085 — canonical entry: /docs/references#nogueira2019passagererank.

OWASP Foundation. (2025). OWASP Top 10 for Large Language Model Applications. OWASP project page. https://genai.owasp.org/llm-top-10/ — canonical entry: /docs/references#owasp_llm_top10.

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering. https://arxiv.org/abs/2306.08302

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019). https://arxiv.org/abs/1908.10084

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019 — canonical entry: /docs/references#robertson2009bm25.

Robinson, I., Webber, J., & Eifrem, E. (2015). Graph databases: New opportunities for connected data (2nd ed.). O'Reilly Media.

Sarmah, B., Mehta, D., Hall, B., Rao, R., Patel, S., & Pasquali, S. (2024). HybridRAG: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. arXiv preprint, arXiv:2408.04948. https://arxiv.org/abs/2408.04948 — canonical entry: /docs/references#sarmah2024hybridrag.

Soman, K., Rose, P. W., Morris, J. H., Akbas, R. E., Smith, B., Peetoom, B., Villouta-Reyes, C., Cerono, G., Shi, Y., Rizk-Jackson, A., et al. (2024). Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics. https://arxiv.org/abs/2311.17330 — canonical entry: /docs/references#soman2024biomedicalkg.

Shang, J., et al. (2025). MedRAG: Enhancing retrieval-augmented generation with knowledge graph-elicited reasoning for healthcare copilot. In Proceedings of the ACM Web Conference 2025. https://dl.acm.org/doi/10.1145/3696410.3714782

Singh, A., Ehtesham, A., Kumar, S., & Srinath, S. (2025). Agentic retrieval-augmented generation: A survey on agentic RAG. arXiv preprint, arXiv:2501.09136. https://arxiv.org/abs/2501.09136

SNOMED International. (2024). SNOMED CT Belgian Edition. https://www.snomed.org/

Wang, Z., Araki, J., Jiang, Z., Parvez, M. R., & Neubig, G. (2024). Learning to filter context for retrieval-augmented generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). https://arxiv.org/abs/2311.08377

Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., & Wesslen, A. (2012). Experimentation in software engineering. Springer. https://doi.org/10.1007/978-3-642-29044-2

World Health Organization. (2021). Ethics and governance of artificial intelligence for health. WHO. https://www.who.int/publications/i/item/9789240029200

Yan, S., Gu, J., Zhu, Y., & Ling, Z. (2024). Corrective retrieval augmented generation. arXiv preprint, arXiv:2401.15884. https://arxiv.org/abs/2401.15884

Zheng, L., Chiang, W., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J. E., & Stoica, I. (2024). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Advances in Neural Information Processing Systems (NeurIPS 2023). https://arxiv.org/abs/2306.05685

ZOL. (2025). Ziekenhuis Oost-Limburg website analytics [Internal data].

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. arXiv preprint, arXiv:2307.15043. https://arxiv.org/abs/2307.15043 — canonical entry: /docs/references#zou2023gcg.

Citations marked TODO

The following inline citations in Chapters 1–6 use names that do not yet have a corresponding bibkey on the canonical bibliography page. They are marked as TODO so that the controller can close the gap in the tail commit. The thesis text remains complete and citable in APA form via the entries above; the TODO markers are inline reminders, not gaps in the academic record.

Surface name in thesis	Proposed bibkey	Source
Manning et al. 2008 (IR textbook)	`manning2008ir`	C. D. Manning, P. Raghavan, & H. Schütze. (2008). Introduction to information retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/
Bodenreider 2004 (UMLS)	`bodenreider2004umls`	O. Bodenreider. (2004). The Unified Medical Language System (UMLS). Nucleic Acids Research, 32(D267–D270).
Guu et al. 2020 (REALM)	`guu2020realm`	K. Guu et al. (2020). REALM. ICML 2020. https://arxiv.org/abs/2002.10083
Yan et al. 2024 (CRAG)	`yan2024crag`	S. Yan et al. (2024). Corrective retrieval augmented generation. https://arxiv.org/abs/2401.15884
Wang et al. 2024 (FILCO)	`wang2024filco`	Z. Wang et al. (2024). Learning to filter context for retrieval-augmented generation. https://arxiv.org/abs/2311.08377
Edge et al. 2024 (GraphRAG)	`edge2024graphrag`	D. Edge et al. (2024). From local to global: A graph RAG approach. https://arxiv.org/abs/2404.16130
Es et al. 2023 (RAGAS)	`es2023ragas`	S. Es et al. (2023). RAGAS. https://arxiv.org/abs/2309.15217
Inan et al. 2023 (Llama Guard)	`inan2023llamaguard`	H. Inan et al. (2023). https://arxiv.org/abs/2312.06674
Ji et al. 2023 (hallucination survey)	`ji2023hallucination`	Z. Ji et al. (2023). ACM Computing Surveys, 55(12).
Wohlin et al. 2012 (software experimentation)	`wohlin2012experimentation`	C. Wohlin et al. (2012). Experimentation in software engineering. Springer.
Efron & Tibshirani 1993 (bootstrap)	`efron1993bootstrap`	B. Efron & R. J. Tibshirani. (1993). An introduction to the bootstrap. Chapman & Hall/CRC.
Martin 2017 (Clean Architecture)	`martin2017clean`	R. C. Martin. (2017). Clean architecture. Prentice Hall.
Fowler 2007 (Mocks Aren't Stubs)	`fowler2007mocks`	M. Fowler. (2007). martinfowler.com. https://martinfowler.com/articles/mocksArentStubs.html
Thakur et al. 2021 (BEIR)	`thakur2021beir`	N. Thakur et al. (2021). https://arxiv.org/abs/2104.08663
Muennighoff et al. 2022 (MTEB)	`muennighoff2022mteb`	N. Muennighoff et al. (2022). https://arxiv.org/abs/2210.07316
Asai et al. 2024 (Self-RAG)	`asai2024selfrag`	A. Asai et al. (2024). https://arxiv.org/abs/2310.11511
Chandak et al. 2023 (precision medicine KG)	`chandak2023precisionkg`	P. Chandak et al. (2023). Scientific Data, 10(67).
SNOMED International (2024)	`snomed_international`	SNOMED International. (2024). SNOMED CT. https://www.snomed.org/

Citations marked TODO​

Citations marked TODO