Core Concepts: How the Search Works
This is the starting point. It assumes you have never seen a retrieval-augmented search system before, and builds up — from the problem, to what the technology is, to how a single question travels through the system, to the pieces that keep the answers correct. Every concept links to a deeper page when you want detail; read straight through first.
1. The problem
Ziekenhuis Oost-Limburg (ZOL) has a large website — roughly a thousand brochures and hundreds of condition and department pages — and about 25,000 searches a month. The old search was keyword search: it matched the literal words you typed. If a patient typed "suikerziekte" (the everyday Dutch word) but the page said "Diabetes Mellitus", the search found nothing. People gave up and phoned the help desk, which was constantly overwhelmed.
We want something different: a visitor should be able to ask a question in plain language — in any of several languages — and get a clear, correct answer with links to the source pages. With one hard rule, because this is a hospital:
The system is a search and navigation tool. It must never give medical advice. It surfaces what the hospital's own pages say, with citations — it does not diagnose, recommend treatment, or speculate.
2. What is RAG?
The technology that makes this possible is Retrieval-Augmented Generation (RAG) (Lewis et al. 2020). The name is just two ideas bolted together:
- Retrieval — first, find the passages in the hospital's own content that are relevant to the question.
- Generation — then, hand those passages to a large language model (LLM) and ask it to write the answer using only that retrieved text, with citations.
Why not just ask the LLM directly? Because an LLM on its own invents plausible-sounding text — a hospital cannot afford that. By forcing it to answer only from retrieved, citeable hospital content, we keep every answer grounded: traceable to a real source, and refusable when the content isn't there.
That is the whole idea. Everything else on this page is about doing each step well for a hospital, in Dutch, safely.
3. The journey of a single question
A real query passes through a short pipeline. Here it is end to end — each box is a stage you can read about in depth later:
- Understand — detect the language (the system serves Dutch, English, French and more) and classify the intent (is this a "which department?" question, a "when is Dr. X available?" question, a request for medical advice we must refuse?).
- Retrieve — search the index for candidate passages. ZOL uses hybrid search: meaning-based vector search and exact-keyword BM25 search, combined — so it catches both paraphrases and exact terms like drug names.
- Rerank — the candidates arrive roughly sorted; reranking reorders them so the most relevant and most appropriate passage is first. This matters because the model only reads the top few.
- Assemble context — gather the winning passages and check the grounding is strong enough; if not, the system would rather refuse than guess.
- Generate — the LLM writes the answer in the user's language, grounded in the assembled Dutch passages, with citation markers.
- Safety check — a final filter blocks anything that reads as medical advice and appends the disclaimer.
Steps ② and ③ are where most of the cleverness lives, and they are the subject of the next section.
4. Why plain search isn't enough: three problems, three fixes
Plain meaning-based (vector) search gets you surprisingly far, but it fails in three specific, recurring ways on hospital content. ZOL layers three subsystems on top of retrieval, each closing one failure mode. They are modular augmentations (Gao et al. 2024) — they steer the retriever rather than replace it.
| Subsystem | What it adds | The failure it fixes | Deep dive |
|---|---|---|---|
| Taxonomy | A structured map of the hospital — doctors ↔ departments ↔ campuses ↔ conditions ↔ treatments | "Which department treats X?" / "Where does Dr. Y work?" — relational facts that are scattered across separate pages and that prose search can't assemble | Knowledge Graph · Taxonomy Query Enrichment |
| SNOMED CT | A universal medical dictionary that links patient words to clinical words | The vocabulary gap: a patient says "suikerziekte", the page says "Diabetes Mellitus" | SNOMED CT Integration |
| Value Framework | A reranker that checks the kind of content fits the kind of question | Cross-category contamination: a practical question answered from regulatory content because they share a keyword | Value Framework |
A simple way to remember the three roles:
- Taxonomy is the skeleton — the authoritative org-chart. It answers who / where / which department.
- SNOMED is the shared vocabulary — a language-neutral concept graph (~356k concepts) that gives every clinical idea a stable ID and a thesaurus of synonyms. It answers what does this patient word mean clinically.
- The Value Framework is the referee — once retrieval has produced candidates, it decides which category of fact is allowed to win for this kind of question. It answers is this the right kind of answer.
The brochure prose (the unstructured text) is the flesh on that skeleton — it answers what / how / why. The three subsystems exist to make that prose reachable, correctly routed, and correctly ranked. Here is how they compose on one real query:
Reading it as a story: the query "suikerziekte, welke dienst?" is resolved — the taxonomy resolver maps suikerziekte → Diabetes Mellitus → Endocrinologie; if that deterministic step fails, the SNOMED fallback walks synonyms and clinical relationships. Retrieval then runs over both the brochure corpus and the structured taxonomy, the candidates are reranked for relevance and then re-weighted by the Value Framework for appropriateness, and the LLM answers from the result — "Endocrinologie, campus Sint-Jan" — with a citation.
5. The index everything searches: ingestion enrichment
All of the above runs at query time. Its quality depends on what was prepared at ingestion time — when pages are crawled and indexed. Before indexing, each passage is enriched with LLM-generated context, canonical questions, and a page summary, so the index is richer than the raw text. This is the natural first deep-dive after this page: Ingestion Enrichment.
6. One system, many hospitals
ZOL is the first tenant, not the only one. Every subsystem obeys a hospital-agnostic invariant (Bezemer & Zaidman 2010), which makes onboarding a new hospital — or a new language — a configuration task, not a rewrite:
- The Value Framework classifies categories with linguistic keyword sets (nl/en/fr/it), not ZOL-specific labels.
- The Taxonomy is
tenant_id-scoped in every table and built per-tenant from a config. - SNOMED concept IDs are internationally universal; only the synonym layer is language-specific, so a new language reuses the entire structural graph.
See the multi-tenancy architecture for the cross-cutting picture. The structural pattern — a knowledge graph fused with vector search — follows HybridRAG (Sarmah et al. 2024).
7. Where to go next
A reading order that follows the flow of this page. Keep the Glossary open alongside — it holds the canonical one-line definition of every term used below (taxonomy, hub page, RRF, Value Framework, FINDING_SITE, …).
- Ingestion Enrichment — how the index is built and enriched before any query arrives.
- Hybrid Search — step ② of the journey: vector + BM25 retrieval fused with RRF.
- Reranking — step ③: how candidates are ordered, including the Value Framework.
- Value Framework — the category referee in depth (introduces the content-category vocabulary the other pages reuse).
- Knowledge Graph → Taxonomy Query Enrichment — the structured skeleton: data model, then how it's used at query time.
- SNOMED CT Integration — read last; the ontology both the taxonomy and the query-time resolver depend on.
References
- Lewis et al. 2020 — the original Retrieval-Augmented Generation architecture.
- Gao et al. 2024 — Modular RAG; frames the three subsystems as orchestrated augmentation modules.
- Sarmah et al. 2024 — HybridRAG; the knowledge-graph + vector pattern the Taxonomy composition instantiates.
- Bezemer & Zaidman 2010 — multi-tenant SaaS isolation; the hospital-agnostic invariant.