Query Enrichment Pipeline
Hospital website content is written in everyday Dutch ("te traag werkende schildklier"), but patients arrive with Latin medical terminology ("hypothyreoïdie"), colloquial terms ("rugpijn"), or clinical abbreviations ("AVM"). Retrieval from an unenriched query embeds the patient's surface form against a corpus that doesn't share their vocabulary, with predictable consequences for recall.
Query enrichment runs as _qs_enrich_query() in the RAG service — between intent classification and retrieval — and applies three enrichment layers in cascade. The cascade does not modify the user's question; it appends canonical terms in parentheses so that both the embedding and the BM25 tsvector see the bridging vocabulary.
This page covers the pre-retrieval term-bridging cascade (_qs_enrich_query): SNOMED synonyms, taxonomy TREATS routing, and Latin→Dutch translation appended to the search query. The broader structured-knowledge injection flow — entity resolution, ontology lookup, the conditional doctor-list injection (Stage 5c), and prompt augmentation — is documented in Taxonomy Query Enrichment.
Why enrichment exists
The feature was identified as a regression during the pilot evaluation on 2026-03-20: the SNOMED synonym expansion that existed in the Neo4j-based pipeline had been silently lost during the migration to the PostgreSQL taxonomy on 2026-03-07. Three golden evaluation questions (GQ-168, GQ-169, GQ-173) failed because medical terminology queries could not find their Dutch equivalents in the taxonomy. The cascade documented here was the structural response.
This is also the canonical example of the silent-failure discipline codified in CLAUDE.md (R1/R2/R3). The post-mortem closed not just the recall regression but also added size logs to every collection-returning function in the enrichment path, and a contract test for the cross-component handoff between the intent classifier's ExtractedEntities and the enrichment service's resolved-term injection.
Trade-offs
| Decision | Chosen | Alternatives considered | Rejected because |
|---|---|---|---|
| Where to enrich | Append canonical terms in parentheses to the user query before embedding/BM25 | Replace user query with canonical term; append after embedding (re-embed twice); enrich only at re-rank | Replacement loses the user's surface form, which the LLM still wants for paraphrase fidelity in the answer. Re-embedding twice doubles the cost of the dense path on every query. Late enrichment (at re-rank) is too late — the recall failure is upstream of re-rank. Pre-retrieval append leaves the original token in place and adds the canonical neighbour as a free signal for both the dense and sparse paths. |
| Layer order | SNOMED → taxonomy TREATS → per-word fallback | Run all layers, take union | The layers are in quality order (SNOMED is most precise, per-word fallback is most permissive). First-match short-circuits to keep the appended canonical-term list focused. Union-of-all produced noisy enriched queries (5–6 canonical terms appended) that hurt BM25 precision. |
| Synonym source | SNOMED CT Belgian Edition (529 K active Dutch descriptions) | Curated alias map; Wiktionary; Wikipedia redirects | A curated map cannot keep pace with clinical vocabulary; Wiktionary lacks medical depth; Wikipedia redirects are too noisy. SNOMED CT is the international standard for clinical terminology with native Dutch coverage. See SNOMED CT Terminology for ingestion and resolution detail. |
| Hospital-agnosticism | Latin → Dutch translation lives in the LLM intent-classifier prompt, not in code | Hardcoded term mapping in code | A code-level mapping must be maintained per hospital; the LLM-prompt approach is hospital-agnostic — the model uses its medical knowledge natively. The prompt instruction is a single sentence. |
Architecture
Layer 1 — SNOMED synonym expansion
Purpose: bridge Latin or scientific medical terms to the Dutch names used in the hospital taxonomy.
Algorithm:
- Look up the exact query in
app.snomed_descriptions(529 K active Dutch medical terms — see SNOMED CT Terminology). - For each match, find taxonomy entities sharing the same
snomed_concept_id. - If the entity name differs from the query token, inject it as a search enrichment.
Worked example:
- Query: "hypothyreoïdie"
- SNOMED match: concept 40930008 (hypothyreoïdie)
- Published taxonomy entity with concept 40930008: "Hypothyroïdi" (CONDITION, status
published) - Enriched query:
"hypothyreoïdie (Hypothyroïdi)"
Fallback within Layer 1: if exact match fails, the resolver tries a fuzzy match (LIKE '%query%') against SNOMED descriptions. This catches compound terms — lumbale discushernia matching a query for discushernia.
Layer 2 — taxonomy TREATS / OFFERS expansion
Purpose: when the user asks about a condition, append the department names that treat that condition so retrieval lifts the relevant department-overview pages.
Algorithm:
- Pull the
conditionfield from the intent classifier'sExtractedEntities. - Query
published_relationshipswhererelationship_type IN ('TREATS', 'OFFERS', 'PERFORMS')andtarget_id = condition.id. - Append the resolved department names to the search query.
Worked example:
- Query: "ik heb last van epilepsie"
- Intent classifier extracts:
condition = "epilepsie" - Taxonomy: Neurologie TREATS Epilepsie; Neurochirurgie TREATS Epilepsie
- Enriched query:
"ik heb last van epilepsie (Neurologie, Neurochirurgie)"
This ensures department overview pages rank higher in retrieval — not just condition-explanation pages — so the LLM gets the routing answer the patient actually needs.
Layer 3 — Latin-to-Dutch translation (LLM)
Purpose: for inputs Layers 1 and 2 cannot resolve, the intent classifier's reformulated_query carries Latin → Dutch translation.
The intent classification prompt includes the rule:
"When the user uses Latin or scientific medical terms, ALWAYS include the common Dutch equivalent in the reformulated_query. Hospital websites use Dutch, not Latin. Use your medical knowledge to translate."
This is hospital-agnostic: there are no hardcoded term mappings. The LLM uses its medical knowledge natively, which means no per-tenant maintenance.
Relationship to Stage 5c (synthetic doctor-list injection)
When Layer 2 resolves a condition to a department, downstream Stage 5c may also fire if the intent is doctor_lookup AND the query contains a list-signal ("alle", "welke artsen", "wie werkt er"). In that case the resolved department from Layer 2 is the same hint Stage 5c uses to fetch the full doctor roster. See Taxonomy Query Enrichment and Query Pipeline §Stage 5c.
Empirical impact
Measured on the golden evaluation set across the pre-enrichment baseline (pilot v2) and the post-enrichment build (pilot v7):
| Metric | Before (pilot v2, n=268) | After (pilot v7, n=299) | Δ |
|---|---|---|---|
| SNOMED terminology category | 88.0 % (22/25) | 100.0 % (33/33) | +12.0 pp |
| Condition-department routing | 94.7 % (36/38) | 100.0 % (46/46) | +5.3 pp |
| Overall pass rate | 95.9 % (257/268) | 99.0 % (296/299) | +3.1 pp |
The denominator change (268 → 299) reflects the natural growth of the golden set as new failure cases were added between the two runs; the comparison is across different set sizes, not the same set. The category-level deltas (SNOMED, condition-department) measure the same per-question pass/fail and are the cleanest evidence of the cascade's effect.
A re-run on the post-OpenAI-embedding-migration corpus is not yet measured.
Performance
| Layer | Cost per query | Notes |
|---|---|---|
| Layer 1 — SNOMED exact lookup | ~5 ms | Single SQL query against indexed snomed_descriptions(term_lower). Fuzzy fallback adds ~5 ms when exact misses. |
| Layer 2 — taxonomy TREATS | ~3 ms | Single SQL query against indexed published_relationships. Skipped when no condition was extracted. |
| Layer 3 — LLM translation | 0 ms (additive) | Already happens inside intent classification; not a separate call. |
Total enrichment overhead: < 10 ms in the typical case where Layers 1 and 2 both hit, < 5 ms when only Layer 1 fires, 0 ms when no enrichment applies.
Known limitations
dyslipidemie— SNOMED concept 370992007 exists but no published taxonomy entity carries this concept (no ZOL page about dyslipidemie). This is a content gap, not a code issue. The resolver returns no enrichment; retrieval falls back to vector-only.hernia nuclei pulposi— the Belgian SNOMED edition maps this term to a different concept ID thandiscushernia. Layer 3 (LLM translation) handles this case because the model knows the two are synonyms, even though the SNOMED graph does not link them.- Compound-word boundary — bloeddrukverlagend (blood-pressure-lowering) does not match Hypertensie through SNOMED alone. The compound is not in
snomed_descriptionsas a single term. Layer 3 (LLM translation) recovers most of these cases; an explicit Dutch-compound splitter has not been needed at pilot scale.
Implementation pointers
| Component | File | What's there |
|---|---|---|
| Layer 1 — SNOMED expansion | backend/app/services/rag_service.py:_qs_enrich_query (~line 2160) | Exact lookup + fuzzy fallback against snomed_descriptions |
| Layer 2 — TREATS expansion | same file (~line 2170) | SQL against published_relationships |
| Layer 3 — LLM translation | backend/app/services/intent_classification_service.py + backend/app/prompts.py | Latin → Dutch instruction in the intent prompt |
| Per-word fallback | backend/app/services/rag_service.py:_qs_enrich_query (~line 2180) | Iterate >5-char tokens against SNOMED descriptions |
References
- @chen2024bgem3 — historical context for the Layer 1 vector-search baseline; BGE-M3 is now the ColBERT-only model after ADR-0048.
- SNOMED CT Terminology — terminology ingestion, the 12-resolver chain, and the FINDING_SITE routing extension.
- Taxonomy Query Enrichment — the broader 12-resolver chain that runs alongside the cascade documented here.
- ADR-0019: Contextual Embeddings — how canonical questions and chunk context are embedded.
- Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.