Taxonomy Query Enrichment
Every non-blocked query passes through a taxonomy enrichment pipeline that injects structured hospital knowledge into the retrieval and generation stages. This page documents the query-time enrichment flow — from entity extraction through taxonomy resolution, ontology lookup, injection gating, the conditional doctor-list injection (Stage 5c), and prompt augmentation — explaining how structured knowledge-graph data complements vector search to produce grounded, entity-aware responses.
Two pages cover "enrichment", at different levels. The Query Enrichment Pipeline is the canonical reference for the _qs_enrich_query() cascade — the three layers (SNOMED synonym, TREATS/OFFERS, Latin→Dutch) that append bridging vocabulary to the query string. This page is the broader taxonomy resolver flow that the cascade sits inside: entity extraction, the 12-resolver chain, the injection gate, Stage 5c, and prompt augmentation. Step 3 below is where the two meet.
This is the query-time view of the Taxonomy. Resolver R7–R10 fall back to SNOMED CT when the deterministic chain misses, and Stage 5b is the Value Framework reranker. See the Core Concepts flow for how all three subsystems compose on a single query.
For the ingestion-time taxonomy population see the Population Lifecycle and Seeding Pipeline. For the taxonomy data model see Knowledge Graph Overview. For the overall query pipeline context see Query Processing Pipeline.
Trade-offs
| Decision | Chosen | Alternatives considered | Rejected because |
|---|---|---|---|
| Resolver chain shape | 12 specialised resolvers, first-match wins | Single fuzzy resolver; LLM-only resolution; one resolver per entity type | A single fuzzy resolver collapses precision (cardilogie → cardiologie and radiologie), an LLM-only resolver pays $0.0001 per query for what is mostly cheap SQL, and one-resolver-per-type misses cross-type aliases (e.g., a single-word query that could be a department alias OR a condition alias). The 12-resolver chain runs the cheap precise resolvers first and falls through to the fuzzy fallback only as a last resort. |
| Injection gate | Four-rule gate: structural intent → inject; sparse vector → inject; low similarity → inject; default → suppress | Always inject; never inject; LLM-decided gate | Always-inject dilutes strong vector contexts with redundant taxonomy text and hurts faithfulness scores; never-inject costs the structural-question accuracy that the taxonomy is the authoritative source for. An LLM-decided gate adds a per-query LLM call to a path that doesn't otherwise need one. The four-rule gate is deterministic and fast, and its rules cover the cases the data shows matter. |
| Stage 5c (synthetic doctor-list injection) | Triple-gated: intent in doctor_lookup or dept_lookup AND list-signal phrase AND department hint | Always-inject the full doctor list; never inject; cross-encoder re-rank that promotes doctor pages | Always-inject blows the context budget for single-doctor queries; never-inject was the pre-fix shape and produced the dermatologist-list regression. A cross-encoder re-rank cannot discover doctors that the first-stage retrieval missed entirely. The triple gate fires only for the specific failure mode (list questions about a department's doctors) and is a no-op otherwise. |
Enrichment Pipeline Overview
Step 1: Entity Extraction
During intent classification, the Tier 2 LLM extracts structured medical entities alongside the intent. These entities drive all downstream taxonomy operations.
The ExtractedEntities structure contains:
| Field | Type | Example |
|---|---|---|
condition | str | None | "hartkloppingen" |
department | str | None | "Cardiologie" |
treatment | str | None | "chemotherapie" |
examination | str | None | "MRI" |
doctor | str | None | "Dr. Peeters" |
campus | str | None | "Sint-Jan" |
For colloquial or indirect queries (e.g., "Wie kan helpen met hartproblemen?"), the LLM extracts the implicit entity ("hartproblemen" as a condition) even though no medical term appears explicitly. See ADR-0030 for the design rationale.
Implementation: rag_service.py:399–506 (_classify_intent_and_rewrite)
Step 2: Taxonomy Resolution (12-Resolver Chain)
The extracted entities pass through HospitalTaxonomy.resolve_search_query(), which runs a chain of responsibility — twelve resolvers executed in priority order. The first resolver to match wins. This design enables precise resolution even for ambiguous, misspelled, or multi-language input.
Resolver Descriptions
| # | Resolver | Purpose | Example |
|---|---|---|---|
| 1 | Enrichment Trigger | Detect navigational enhancement phrases | "meer informatie over..." |
| 2 | Campus Exact | Match campus names and aliases | "Sint-Jan", "Genk" → Campus Sint-Jan |
| 3 | Dept Skipgram | Order-independent multi-word matching | "intensive zorgen" matches "Intensieve Zorgen" |
| 4 | Dept N-gram | Consecutive 2/3-word pair matching | "spoed opname" → Spoedgevallen |
| 5 | Dept Single Alias | Broadest single-word alias match | "cardio" → Cardiologie |
| 6 | Dept Alias Map | Exact department name lookup | "Orthopedie" → Orthopedie |
| 7 | Condition Exact | SNOMED → CONDITION_ALIASES → raw keywords | "suikerziekte" → Diabetes Mellitus |
| 8 | Dept from Condition | Condition-to-department routing | Diabetes Mellitus → Endocrinologie |
| 9 | Treatment Exact | SNOMED → TREATMENT_ALIASES resolution | "hartfilmpje" → ECG |
| 10 | Examination Exact | SNOMED → EXAMINATION_ALIASES resolution | "bloedonderzoek" → Labo |
| 11 | Specialty Exact | Specialty name lookup | "orthopedisch chirurg" |
| 12 | Fuzzy Fallback | Misspelling detection (cutoff=0.8) | "cardilogie" → Cardiologie |
SNOMED CT Integration
Resolver 7 (Condition Exact) can optionally fall back to SNOMED CT synonym expansion via resolve_search_query_with_snomed() when the deterministic alias map misses. The synonym-expansion mechanism itself (alias map → SNOMED Dutch descriptions → IS-A concept expansion) is documented canonically in Query Enrichment → Layer 1; the underlying ontology integration is in SNOMED CT Terminology.
Implementation: hospital_taxonomy.py:583–1059 (resolve_search_query, _resolve_search_query_inner, resolve_search_query_with_snomed)
Step 3: Query Enrichment
After taxonomy resolution, the resolved canonical terms are appended to the search query (e.g. "hartkloppingen" → "hartkloppingen (Palpitaties, Cardiologie)") so that both the embedding and the BM25 tsvector see the bridging vocabulary, improving recall against the canonical-Dutch corpus.
This is the same _qs_enrich_query() cascade documented canonically — the three enrichment layers (SNOMED synonym expansion, taxonomy TREATS/OFFERS routing, Latin-to-Dutch translation), worked examples, and empirical impact — on the Query Enrichment Pipeline page. This page covers only how the step sits within the taxonomy resolver flow.
Implementation: rag_service.py:2158–2181 (_qs_enrich_query)
Step 4: Sequential Retrieval
Three operations execute sequentially (asyncpg does not support concurrent queries on the same session):
4a. Vector Search (Enriched Query)
Standard pgvector cosine similarity search using the enriched query. Returns document chunks ranked by semantic similarity. See Hybrid Search for the full retrieval architecture.
4b. Taxonomy Search (Intent-Routed SQL)
The TaxonomyQueryService routes to intent-specific SQL handlers that traverse the taxonomy relationships:
| Intent | Handler | SQL Pattern |
|---|---|---|
doctor_lookup | _handle_doctor_lookup | doctors → doctor_departments → departments → department_campuses |
department_or_service_lookup | _handle_department_lookup | departments → department_campuses → doctors |
condition_information | _handle_condition_info | conditions → dept_handles_condition → departments → doctors |
treatment_or_exam_information | _handle_treatment_exam_info | treatments/examinations → dept_offers_treatment → departments |
booking_or_contact | _handle_booking_contact | departments → department_campuses (with contact info) |
Each handler returns structured results like:
{
"type": "department_for_condition",
"department": "Cardiologie",
"condition": "Palpitaties",
"campuses": "ZOL Genk, campus Sint-Jan",
"doctors": "Dr. Peeters, Dr. Janssen",
"source": "taxonomy"
}
These are converted to natural language content strings and merged with vector results.
Implementation: taxonomy/query_service.py:48–435
4c. Ontology Lookup
Running as part of the sequential retrieval chain, the ontology lookup:
- Entity Linking (
EntityLinker.link_multiple()): Maps extracted entity mentions to their taxonomy database IDs - Relationship Retrieval (
OntologyQueryService.build_context()): Fetches relationships (PART_OF, TREATED_BY, HAS_FACILITY, etc.) for the linked entities - Context Formatting: Produces an
OntologyContextobject that renders as a prompt block
The ontology block is prepended to the assembled context, giving the LLM explicit knowledge of entity relationships.
Implementation: rag_service.py:2207–2265 (_qs_ontology_lookup)
Step 5: Taxonomy Injection Gate
Not all queries benefit from taxonomy data. When vector search returns strong, relevant results, injecting taxonomy data can dilute the context with less relevant structured information. The injection gate applies four rules in order:
| Rule | Condition | Action | Rationale |
|---|---|---|---|
| 1. Structural intent | Intent is doctor_lookup, department_lookup, condition_info, treatment_info, or symptom_description | Inject | Taxonomy is the authoritative source for organizational data |
| 2. Sparse vector results | Vector returned fewer chunks than graph_injection_min_vector_results | Inject | Graph fills the retrieval gap |
| 3. Low similarity | Best vector similarity score below graph_injection_similarity_threshold | Inject | Rescue scenario — vector results are weak |
| 4. Default | Strong vector results with sufficient similarity | Suppress | Avoid diluting rich vector context |
When suppressed, taxonomy results from Step 4b are excluded from the context window. Only vector chunks proceed to the LLM.
Implementation: rag_service.py:2571–2662 (_should_inject_taxonomy_context, _build_context_from_chunks)
Stage 5c: Synthetic Department-Doctor-List Injection
Stage 5c is a post-retrieval, pre-context-assembly step that fires only when all three of the following hold:
- The classified intent is
DOCTOR_LOOKUPorDEPARTMENT_OR_SERVICE_LOOKUP. - The user query contains a list-signal phrase matched by
_LIST_SIGNAL_RE(e.g., alle, welke artsen, wie werkt er, list all, tous les médecins). - A department or specialty hint can be resolved either from the classifier's
ExtractedEntities(department,service, ordoctor) or from a regex sweep over the rewritten query.
When all three gates pass, the stage queries the taxonomy for all doctors associated with the resolved department, builds a synthetic chunk listing them, and inserts it into the retrieved-chunks set before context assembly. This guarantees the LLM has the full roster available so the system prompt's "list all members" exception rule can fire faithfully — the LLM cannot list doctors it never saw in the context.
The stage was introduced as a regression fix for the 2026-05-09 incident: a 6-turn voice conversation about dermatologists capped at the same two names because vector retrieval surfaced individual doctor brochure pages without the shared department roster, and re-ranking could only reorder what retrieval returned. The synthetic-chunk approach guarantees the roster is in the context regardless of which doctor brochures retrieval picked up.
When any of the three gates is unsatisfied (e.g., the query is "Wie is Dr. X?" — a single-doctor lookup, no list signal), Stage 5c is a no-op and adds zero latency. When it does fire, the cost is one indexed taxonomy query (~5 ms) plus the synthetic chunk's contribution to the assembled context (a single short paragraph; well within budget).
Interaction with post-answer enrichment
When Stage 5c fires, all department doctors are already visible to the LLM, so the post-answer taxonomy enrichment step (described below) finds zero "new" doctors and appends nothing. The two mechanisms are complementary: Stage 5c is the proactive path for queries the gate identifies as list questions; post-answer enrichment is the safety net for multi-part queries whose intent classification didn't trigger Stage 5c.
Implementation: rag_service.py:2134–2197 (_qs_maybe_inject_doctor_list); call site at rag_service.py:3537.
Stage Execution Order
For an examiner tracing a live query, the post-classification stages execute in this order:
| Stage | Purpose | Cost when active | Cost when no-op |
|---|---|---|---|
| 5a Entity extraction (via intent classification) | Extract structured entities from the query | included in intent LLM call | – |
| Step 2 Taxonomy resolution (12-resolver chain) | Resolve user terms to canonical entity IDs | ~10 ms typical; ~30 ms with fuzzy fallback | – |
| Step 3 Query enrichment | Append canonical terms to search query | ~1 ms | – |
| Step 4 Sequential retrieval (vector + BM25 + taxonomy) | Gather candidate chunks | ~800 ms | – |
| Stage 5b Value Framework affinity rerank | Multiply scores by intent × content_category matrix | ~2 ms | ~2 ms (always-on) |
| Stage 5c Synthetic doctor-list injection | Append synthetic chunk with full department roster | ~5 ms | 0 ms |
| Step 5 Taxonomy injection gate | Decide whether taxonomy results enter the context | < 1 ms | – |
| Step 6 Routing hint injection | Prepend "condition X falls under department Y" directive | < 1 ms | – |
| Step 7 System prompt augmentation | Append GRAPH_CONTEXT_INSTRUCTIONS | < 1 ms | – |
Step 6: Routing Hint Injection
When taxonomy resolves a condition→department mapping, a routing hint is prepended to the assembled context. This is a strong directive that ensures the LLM always mentions the correct department:
--- ORGANISATIE-INFORMATIE ---
De aandoening "Palpitaties" valt onder de dienst Cardiologie.
Je MOET deze dienst vermelden in je antwoord.
The routing hint is injected regardless of the injection gate decision — even if taxonomy results are suppressed, the organizational routing information is always present. This prevents the LLM from naming incorrect departments when the vector context alone is ambiguous.
Implementation: rag_service.py:2887–2903 (_qs_inject_routing_hint)
Step 7: System Prompt Augmentation
When taxonomy data is present in the context (either via injection gate or routing hint), the system prompt receives additional GRAPH_CONTEXT_INSTRUCTIONS:
The following structured information was retrieved from the hospital knowledge graph.
- ALWAYS include relevant department names and organizational information.
- When a condition is discussed, you MUST mention which department(s) handle it.
- For department routing (which department handles a condition), treat the graph data as AUTHORITATIVE.
- Graph-derived information with a [number] marker — use it to cite.
- Graph-derived information without a [number] marker is supplementary and should NOT be cited with numbers.
These instructions ensure the LLM prioritizes taxonomy-derived organizational data over potentially conflicting vector search results. The "AUTHORITATIVE" directive is critical — it means when the taxonomy says Cardiologie handles Palpitaties, the LLM will state this even if a vector chunk from an outdated brochure suggests otherwise.
Implementation: prompts.py:244–257, rag_service.py:4116
End-to-End Worked Example
User query: "Waar kan ik terecht met hartkloppingen?"
This example traces one query. For the same flow walked across five contrasting queries — taxonomy-TREATS department routing, a doctor-list (Stage 5c) question, a medical-dosing safety-gate decision, a cross-language rewrite, and a SNOMED synonym expansion — all with values captured live from the pilot, see A Query, End-to-End.
| Stage | Input | Output |
|---|---|---|
| Intent Classification + Rewriting | Raw query | One LLM call (ADR-0030) emits all three: intent=condition_information, entities={condition: "hartkloppingen"}, and rewritten_query="Welke afdeling van ZOL behandelt hartkloppingen?" (canonical Dutch — this becomes the search_query the rows below operate on). See Query Rewriting. |
| Taxonomy Resolution | entity "hartkloppingen" | condition="Palpitaties", department="Cardiologie" (Resolver 7→8) |
| Query Enrichment | "hartkloppingen" | "hartkloppingen (Palpitaties, Cardiologie)" |
| Vector Search | Enriched query | 8 chunks about palpitations, heart rhythm, Cardiologie brochure |
| Taxonomy Search | condition→dept SQL | {dept: "Cardiologie", campuses: "Sint-Jan, André Dumont", doctors: [...]} |
| Ontology Lookup | entity IDs | Palpitaties TREATED_BY Hartritme-onderzoek, Cardiologie HAS_FACILITY Hartcentrum |
| Injection Gate | Rule 1: structural intent | INJECT (condition_information) |
| Routing Hint | condition→dept | "De aandoening Palpitaties valt onder de dienst Cardiologie." |
| System Prompt | has_graph_context=true | + GRAPH_CONTEXT_INSTRUCTIONS appended |
| LLM Response | Full context | "Bij hartkloppingen kunt u terecht bij de dienst Cardiologie van ZOL..." |
Configuration
Two settings in config.py control the injection gate thresholds:
| Setting | Default | Description |
|---|---|---|
graph_injection_min_vector_results | 3 | Minimum vector results before taxonomy is suppressed |
graph_injection_similarity_threshold | 0.35 | Minimum vector similarity before taxonomy rescue |
Both are manageable via the admin Settings API at runtime.
Key Implementation Files
Line numbers are approximate and shift as the codebase evolves. Use them as starting-point hints rather than exact locations.
| Component | File | Lines (approx.) |
|---|---|---|
| Entity extraction (via intent) | rag_service.py | ~399–506 |
| Taxonomy resolution chain | hospital_taxonomy.py | ~583–1059 |
| Query enrichment | rag_service.py | ~2158–2181 |
| Taxonomy SQL queries | taxonomy/query_service.py | ~48–435 |
| Ontology lookup | rag_service.py | ~2207–2265 |
| Injection gate | rag_service.py | ~2571–2662 |
| Routing hint | rag_service.py | ~2887–2903 |
| Graph context instructions | prompts.py | ~244–257 |
| Prompt assembly | rag_service.py | ~4084–4198 |
Post-Answer Taxonomy Enrichment
In addition to pre-retrieval enrichment (Sections 1–6 above), the pipeline performs a post-answer enrichment step after the LLM generates its response. This addresses a specific gap: multi-part queries (e.g., "Wat zijn de bezoekuren? Bij wie kan ik terecht bij kinderpsychiatrie?") may be classified with a non-structural intent like NAVIGATIONAL, causing the pre-retrieval taxonomy injection gate to suppress graph data. The LLM answers from vector search alone, potentially missing relationship data.
How It Works
The guard node G was added after the N*-afdeling incident: a DEPARTMENT entity named for a nursing-ward code (N*-afdeling) was matched on the bare word "afdeling" and then won the display tiebreak (min(by length)) over the real Endocrinologie. A department name now must carry a specialty token (an alphabetic word ≥ 4 chars that is not a generic suffix like afdeling/dienst) to enter the WORKS_IN lookup or the display; otherwise enrichment is skipped rather than printing a ward code. See Release Notes — May 29, 2026.
After _qs_finalize generates the response:
- Department scan: All published department names are checked against the response text (case-insensitive).
- Relationship lookup: For each matched department,
WORKS_INrelationships are queried frompublished_relationships. - Deduplication: Doctor names already mentioned in the response are filtered out.
- Append: Remaining doctors are appended as a clearly marked supplement:
---
*Aanvullende informatie uit de ziekenhuistaxonomie:*
Artsen verbonden aan Kinderpsychiatrie: Dr. Frauke Martens, Dr. Karen Gillaerts.
Design Principles
| Principle | Implementation |
|---|---|
| Zero regression | Purely additive — never modifies existing answer text |
| Verified data only | Uses published taxonomy (operator-approved, version-controlled) |
| Non-blocking | Wrapped in try/catch — failures are silently logged |
| Badge activation | Sets has_graph_context = True when enrichment fires, triggering the "Verified with hospital data" badge |
| Multi-part safe | Works regardless of intent classification, since it runs post-generation |
This pattern is particularly valuable for the ZOL use case where patients ask complex, multi-faceted questions that span navigational and structural concerns in a single query.
References
- Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
- Karpukhin, V., et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. EMNLP 2020. — Foundation for the dense bi-encoder retrieval lane that runs alongside the taxonomy resolution chain.
- Sarmah, B., et al. (2024). HybridRAG: Integrating Knowledge Graphs and Vector Retrieval. — Formalises the hybrid KG + vector pattern this page documents. (Sarmah et al. 2024)
- Soman, K., et al. (2024). OntologyRAG: Ontology-enhanced retrieval-augmented generation. (Soman et al. 2024)
- SNOMED CT Terminology — terminology source for resolvers R7/R8/R9/R10.