Taxonomy Query Enrichment

Every non-blocked query passes through a taxonomy enrichment pipeline that injects structured hospital knowledge into the retrieval and generation stages. This page documents the query-time enrichment flow — from entity extraction through taxonomy resolution, ontology lookup, injection gating, the conditional doctor-list injection (Stage 5c), and prompt augmentation — explaining how structured knowledge-graph data complements vector search to produce grounded, entity-aware responses.

Scope vs. Query Enrichment Pipeline

Two pages cover "enrichment", at different levels. The Query Enrichment Pipeline is the canonical reference for the _qs_enrich_query() cascade — the three layers (SNOMED synonym, TREATS/OFFERS, Latin→Dutch) that append bridging vocabulary to the query string. This page is the broader taxonomy resolver flow that the cascade sits inside: entity extraction, the 12-resolver chain, the injection gate, Stage 5c, and prompt augmentation. Step 3 below is where the two meet.

Part of the Knowledge & Retrieval Steering triad

This is the query-time view of the Taxonomy. Resolver R7–R10 fall back to SNOMED CT when the deterministic chain misses, and Stage 5b is the Value Framework reranker. See the Core Concepts flow for how all three subsystems compose on a single query.

For the ingestion-time taxonomy population see the Population Lifecycle and Seeding Pipeline. For the taxonomy data model see Knowledge Graph Overview. For the overall query pipeline context see Query Processing Pipeline.

Trade-offs

Decision	Chosen	Alternatives considered	Rejected because
Resolver chain shape	12 specialised resolvers, first-match wins	Single fuzzy resolver; LLM-only resolution; one resolver per entity type	A single fuzzy resolver collapses precision (cardilogie → cardiologie and radiologie), an LLM-only resolver pays $0.0001 per query for what is mostly cheap SQL, and one-resolver-per-type misses cross-type aliases (e.g., a single-word query that could be a department alias OR a condition alias). The 12-resolver chain runs the cheap precise resolvers first and falls through to the fuzzy fallback only as a last resort.
Injection gate	Four-rule gate: structural intent → inject; sparse vector → inject; low similarity → inject; default → suppress	Always inject; never inject; LLM-decided gate	Always-inject dilutes strong vector contexts with redundant taxonomy text and hurts faithfulness scores; never-inject costs the structural-question accuracy that the taxonomy is the authoritative source for. An LLM-decided gate adds a per-query LLM call to a path that doesn't otherwise need one. The four-rule gate is deterministic and fast, and its rules cover the cases the data shows matter.
Stage 5c (synthetic doctor-list injection)	Triple-gated: intent in `doctor_lookup` or `dept_lookup` AND list-signal phrase AND department hint	Always-inject the full doctor list; never inject; cross-encoder re-rank that promotes doctor pages	Always-inject blows the context budget for single-doctor queries; never-inject was the pre-fix shape and produced the dermatologist-list regression. A cross-encoder re-rank cannot discover doctors that the first-stage retrieval missed entirely. The triple gate fires only for the specific failure mode (list questions about a department's doctors) and is a no-op otherwise.

Enrichment Pipeline Overview

Step 1: Entity Extraction

During intent classification, the Tier 2 LLM extracts structured medical entities alongside the intent. These entities drive all downstream taxonomy operations.

The ExtractedEntities structure contains:

Field	Type	Example
`condition`	str \| None	"hartkloppingen"
`department`	str \| None	"Cardiologie"
`treatment`	str \| None	"chemotherapie"
`examination`	str \| None	"MRI"
`doctor`	str \| None	"Dr. Peeters"
`campus`	str \| None	"Sint-Jan"

For colloquial or indirect queries (e.g., "Wie kan helpen met hartproblemen?"), the LLM extracts the implicit entity ("hartproblemen" as a condition) even though no medical term appears explicitly. See ADR-0030 for the design rationale.

Implementation: rag_service.py:399–506 (_classify_intent_and_rewrite)

Step 2: Taxonomy Resolution (12-Resolver Chain)

The extracted entities pass through HospitalTaxonomy.resolve_search_query(), which runs a chain of responsibility — twelve resolvers executed in priority order. The first resolver to match wins. This design enables precise resolution even for ambiguous, misspelled, or multi-language input.

Resolver Descriptions

#	Resolver	Purpose	Example
1	Enrichment Trigger	Detect navigational enhancement phrases	"meer informatie over..."
2	Campus Exact	Match campus names and aliases	"Sint-Jan", "Genk" → Campus Sint-Jan
3	Dept Skipgram	Order-independent multi-word matching	"intensive zorgen" matches "Intensieve Zorgen"
4	Dept N-gram	Consecutive 2/3-word pair matching	"spoed opname" → Spoedgevallen
5	Dept Single Alias	Broadest single-word alias match	"cardio" → Cardiologie
6	Dept Alias Map	Exact department name lookup	"Orthopedie" → Orthopedie
7	Condition Exact	SNOMED → CONDITION_ALIASES → raw keywords	"suikerziekte" → Diabetes Mellitus
8	Dept from Condition	Condition-to-department routing	Diabetes Mellitus → Endocrinologie
9	Treatment Exact	SNOMED → TREATMENT_ALIASES resolution	"hartfilmpje" → ECG
10	Examination Exact	SNOMED → EXAMINATION_ALIASES resolution	"bloedonderzoek" → Labo
11	Specialty Exact	Specialty name lookup	"orthopedisch chirurg"
12	Fuzzy Fallback	Misspelling detection (cutoff=0.8)	"cardilogie" → Cardiologie

SNOMED CT Integration

Resolver 7 (Condition Exact) can optionally fall back to SNOMED CT synonym expansion via resolve_search_query_with_snomed() when the deterministic alias map misses. The synonym-expansion mechanism itself (alias map → SNOMED Dutch descriptions → IS-A concept expansion) is documented canonically in Query Enrichment → Layer 1; the underlying ontology integration is in SNOMED CT Terminology.

Implementation: hospital_taxonomy.py:583–1059 (resolve_search_query, _resolve_search_query_inner, resolve_search_query_with_snomed)

Step 3: Query Enrichment

After taxonomy resolution, the resolved canonical terms are appended to the search query (e.g. "hartkloppingen" → "hartkloppingen (Palpitaties, Cardiologie)") so that both the embedding and the BM25 tsvector see the bridging vocabulary, improving recall against the canonical-Dutch corpus.

This is the same _qs_enrich_query() cascade documented canonically — the three enrichment layers (SNOMED synonym expansion, taxonomy TREATS/OFFERS routing, Latin-to-Dutch translation), worked examples, and empirical impact — on the Query Enrichment Pipeline page. This page covers only how the step sits within the taxonomy resolver flow.

Implementation: rag_service.py:2158–2181 (_qs_enrich_query)

Step 4: Sequential Retrieval

Three operations execute sequentially (asyncpg does not support concurrent queries on the same session):

4a. Vector Search (Enriched Query)

Standard pgvector cosine similarity search using the enriched query. Returns document chunks ranked by semantic similarity. See Hybrid Search for the full retrieval architecture.

4b. Taxonomy Search (Intent-Routed SQL)

The TaxonomyQueryService routes to intent-specific SQL handlers that traverse the taxonomy relationships:

Intent	Handler	SQL Pattern
`doctor_lookup`	`_handle_doctor_lookup`	`doctors` → `doctor_departments` → `departments` → `department_campuses`
`department_or_service_lookup`	`_handle_department_lookup`	`departments` → `department_campuses` → `doctors`
`condition_information`	`_handle_condition_info`	`conditions` → `dept_handles_condition` → `departments` → `doctors`
`treatment_or_exam_information`	`_handle_treatment_exam_info`	`treatments/examinations` → `dept_offers_treatment` → `departments`
`booking_or_contact`	`_handle_booking_contact`	`departments` → `department_campuses` (with contact info)

Each handler returns structured results like:

{
  "type": "department_for_condition",
  "department": "Cardiologie",
  "condition": "Palpitaties",
  "campuses": "ZOL Genk, campus Sint-Jan",
  "doctors": "Dr. Peeters, Dr. Janssen",
  "source": "taxonomy"
}

These are converted to natural language content strings and merged with vector results.

Implementation: taxonomy/query_service.py:48–435

4c. Ontology Lookup

Running as part of the sequential retrieval chain, the ontology lookup:

Entity Linking (EntityLinker.link_multiple()): Maps extracted entity mentions to their taxonomy database IDs
Relationship Retrieval (OntologyQueryService.build_context()): Fetches relationships (PART_OF, TREATED_BY, HAS_FACILITY, etc.) for the linked entities
Context Formatting: Produces an OntologyContext object that renders as a prompt block

The ontology block is prepended to the assembled context, giving the LLM explicit knowledge of entity relationships.

Implementation: rag_service.py:2207–2265 (_qs_ontology_lookup)

Step 5: Taxonomy Injection Gate

Not all queries benefit from taxonomy data. When vector search returns strong, relevant results, injecting taxonomy data can dilute the context with less relevant structured information. The injection gate applies four rules in order:

Rule	Condition	Action	Rationale
1. Structural intent	Intent is `doctor_lookup`, `department_lookup`, `condition_info`, `treatment_info`, or `symptom_description`	Inject	Taxonomy is the authoritative source for organizational data
2. Sparse vector results	Vector returned fewer chunks than `graph_injection_min_vector_results`	Inject	Graph fills the retrieval gap
3. Low similarity	Best vector similarity score below `graph_injection_similarity_threshold`	Inject	Rescue scenario — vector results are weak
4. Default	Strong vector results with sufficient similarity	Suppress	Avoid diluting rich vector context

When suppressed, taxonomy results from Step 4b are excluded from the context window. Only vector chunks proceed to the LLM.

Implementation: rag_service.py:2571–2662 (_should_inject_taxonomy_context, _build_context_from_chunks)

Stage 5c: Synthetic Department-Doctor-List Injection

Stage 5c is a post-retrieval, pre-context-assembly step that fires only when all three of the following hold:

The classified intent is DOCTOR_LOOKUP or DEPARTMENT_OR_SERVICE_LOOKUP.
The user query contains a list-signal phrase matched by _LIST_SIGNAL_RE (e.g., alle, welke artsen, wie werkt er, list all, tous les médecins).
A department or specialty hint can be resolved either from the classifier's ExtractedEntities (department, service, or doctor) or from a regex sweep over the rewritten query.

When all three gates pass, the stage queries the taxonomy for all doctors associated with the resolved department, builds a synthetic chunk listing them, and inserts it into the retrieved-chunks set before context assembly. This guarantees the LLM has the full roster available so the system prompt's "list all members" exception rule can fire faithfully — the LLM cannot list doctors it never saw in the context.

The stage was introduced as a regression fix for the 2026-05-09 incident: a 6-turn voice conversation about dermatologists capped at the same two names because vector retrieval surfaced individual doctor brochure pages without the shared department roster, and re-ranking could only reorder what retrieval returned. The synthetic-chunk approach guarantees the roster is in the context regardless of which doctor brochures retrieval picked up.

When any of the three gates is unsatisfied (e.g., the query is "Wie is Dr. X?" — a single-doctor lookup, no list signal), Stage 5c is a no-op and adds zero latency. When it does fire, the cost is one indexed taxonomy query (~5 ms) plus the synthetic chunk's contribution to the assembled context (a single short paragraph; well within budget).

Interaction with post-answer enrichment

When Stage 5c fires, all department doctors are already visible to the LLM, so the post-answer taxonomy enrichment step (described below) finds zero "new" doctors and appends nothing. The two mechanisms are complementary: Stage 5c is the proactive path for queries the gate identifies as list questions; post-answer enrichment is the safety net for multi-part queries whose intent classification didn't trigger Stage 5c.

Implementation: rag_service.py:2134–2197 (_qs_maybe_inject_doctor_list); call site at rag_service.py:3537.

Stage Execution Order

For an examiner tracing a live query, the post-classification stages execute in this order:

Stage	Purpose	Cost when active	Cost when no-op
5a Entity extraction (via intent classification)	Extract structured entities from the query	included in intent LLM call	–
Step 2 Taxonomy resolution (12-resolver chain)	Resolve user terms to canonical entity IDs	~10 ms typical; ~30 ms with fuzzy fallback	–
Step 3 Query enrichment	Append canonical terms to search query	~1 ms	–
Step 4 Sequential retrieval (vector + BM25 + taxonomy)	Gather candidate chunks	~800 ms	–
Stage 5b Value Framework affinity rerank	Multiply scores by `intent × content_category` matrix	~2 ms	~2 ms (always-on)
Stage 5c Synthetic doctor-list injection	Append synthetic chunk with full department roster	~5 ms	0 ms
Step 5 Taxonomy injection gate	Decide whether taxonomy results enter the context	< 1 ms	–
Step 6 Routing hint injection	Prepend "condition X falls under department Y" directive	< 1 ms	–
Step 7 System prompt augmentation	Append GRAPH_CONTEXT_INSTRUCTIONS	< 1 ms	–

Step 6: Routing Hint Injection

When taxonomy resolves a condition→department mapping, a routing hint is prepended to the assembled context. This is a strong directive that ensures the LLM always mentions the correct department:

--- ORGANISATIE-INFORMATIE ---
De aandoening "Palpitaties" valt onder de dienst Cardiologie.
Je MOET deze dienst vermelden in je antwoord.

The routing hint is injected regardless of the injection gate decision — even if taxonomy results are suppressed, the organizational routing information is always present. This prevents the LLM from naming incorrect departments when the vector context alone is ambiguous.

Implementation: rag_service.py:2887–2903 (_qs_inject_routing_hint)

Step 7: System Prompt Augmentation

When taxonomy data is present in the context (either via injection gate or routing hint), the system prompt receives additional GRAPH_CONTEXT_INSTRUCTIONS:

The following structured information was retrieved from the hospital knowledge graph.

ALWAYS include relevant department names and organizational information.

When a condition is discussed, you MUST mention which department(s) handle it.

For department routing (which department handles a condition), treat the graph data as AUTHORITATIVE.

Graph-derived information with a [number] marker — use it to cite.

Graph-derived information without a [number] marker is supplementary and should NOT be cited with numbers.

These instructions ensure the LLM prioritizes taxonomy-derived organizational data over potentially conflicting vector search results. The "AUTHORITATIVE" directive is critical — it means when the taxonomy says Cardiologie handles Palpitaties, the LLM will state this even if a vector chunk from an outdated brochure suggests otherwise.

Implementation: prompts.py:244–257, rag_service.py:4116

End-to-End Worked Example

User query: "Waar kan ik terecht met hartkloppingen?"

See it across four queries

This example traces one query. For the same flow walked across five contrasting queries — taxonomy-TREATS department routing, a doctor-list (Stage 5c) question, a medical-dosing safety-gate decision, a cross-language rewrite, and a SNOMED synonym expansion — all with values captured live from the pilot, see A Query, End-to-End.

Stage	Input	Output
Intent Classification + Rewriting	Raw query	One LLM call (ADR-0030) emits all three: intent=`condition_information`, entities=`{condition: "hartkloppingen"}`, and `rewritten_query`=`"Welke afdeling van ZOL behandelt hartkloppingen?"` (canonical Dutch — this becomes the `search_query` the rows below operate on). See Query Rewriting.
Taxonomy Resolution	entity "hartkloppingen"	condition="Palpitaties", department="Cardiologie" (Resolver 7→8)
Query Enrichment	"hartkloppingen"	"hartkloppingen (Palpitaties, Cardiologie)"
Vector Search	Enriched query	8 chunks about palpitations, heart rhythm, Cardiologie brochure
Taxonomy Search	condition→dept SQL	`{dept: "Cardiologie", campuses: "Sint-Jan, André Dumont", doctors: [...]}`
Ontology Lookup	entity IDs	Palpitaties TREATED_BY Hartritme-onderzoek, Cardiologie HAS_FACILITY Hartcentrum
Injection Gate	Rule 1: structural intent	INJECT (condition_information)
Routing Hint	condition→dept	"De aandoening Palpitaties valt onder de dienst Cardiologie."
System Prompt	has_graph_context=true	+ GRAPH_CONTEXT_INSTRUCTIONS appended
LLM Response	Full context	"Bij hartkloppingen kunt u terecht bij de dienst Cardiologie* van ZOL..."*

Configuration

Two settings in config.py control the injection gate thresholds:

Setting	Default	Description
`graph_injection_min_vector_results`	3	Minimum vector results before taxonomy is suppressed
`graph_injection_similarity_threshold`	0.35	Minimum vector similarity before taxonomy rescue

Both are manageable via the admin Settings API at runtime.

Key Implementation Files

Line Number References

Line numbers are approximate and shift as the codebase evolves. Use them as starting-point hints rather than exact locations.

Component	File	Lines (approx.)
Entity extraction (via intent)	`rag_service.py`	~399–506
Taxonomy resolution chain	`hospital_taxonomy.py`	~583–1059
Query enrichment	`rag_service.py`	~2158–2181
Taxonomy SQL queries	`taxonomy/query_service.py`	~48–435
Ontology lookup	`rag_service.py`	~2207–2265
Injection gate	`rag_service.py`	~2571–2662
Routing hint	`rag_service.py`	~2887–2903
Graph context instructions	`prompts.py`	~244–257
Prompt assembly	`rag_service.py`	~4084–4198

Post-Answer Taxonomy Enrichment

In addition to pre-retrieval enrichment (Sections 1–6 above), the pipeline performs a post-answer enrichment step after the LLM generates its response. This addresses a specific gap: multi-part queries (e.g., "Wat zijn de bezoekuren? Bij wie kan ik terecht bij kinderpsychiatrie?") may be classified with a non-structural intent like NAVIGATIONAL, causing the pre-retrieval taxonomy injection gate to suppress graph data. The LLM answers from vector search alone, potentially missing relationship data.

How It Works

Specialty-token guard (2026-05-29)

The guard node G was added after the N*-afdeling incident: a DEPARTMENT entity named for a nursing-ward code (N*-afdeling) was matched on the bare word "afdeling" and then won the display tiebreak (min(by length)) over the real Endocrinologie. A department name now must carry a specialty token (an alphabetic word ≥ 4 chars that is not a generic suffix like afdeling/dienst) to enter the WORKS_IN lookup or the display; otherwise enrichment is skipped rather than printing a ward code. See Release Notes — May 29, 2026.

Shared DepartmentResolver (2026-06-09)

The "Department scan" step (_dept_matches_query in taxonomy_mixin.py) now supports routing through a shared DepartmentResolver cascade (app/services/department_resolver.py) when settings.department_resolver_enabled is True (default off pending the regression eval corpus going green). The resolver applies four structural tiers: normalize/fold → exact match → token-subset match → alias match. This replaces the per-surface substring heuristics with a single deterministic source of truth shared by chat, voice, and schedule surfaces. When the flag is off, the legacy substring/word-overlap path is used unchanged.

After _qs_finalize generates the response:

Department scan: All published department names are checked against the response text (case-insensitive).
Relationship lookup: For each matched department, WORKS_IN relationships are queried from published_relationships.
Deduplication: Doctor names already mentioned in the response are filtered out.
Append: Remaining doctors are appended as a clearly marked supplement:

---
*Aanvullende informatie uit de ziekenhuistaxonomie:*
Artsen verbonden aan Kinderpsychiatrie: Dr. Frauke Martens, Dr. Karen Gillaerts.

Design Principles

Principle	Implementation
Zero regression	Purely additive — never modifies existing answer text
Verified data only	Uses published taxonomy (operator-approved, version-controlled)
Non-blocking	Wrapped in try/catch — failures are silently logged
Badge activation	Sets `has_graph_context = True` when enrichment fires, triggering the "Verified with hospital data" badge
Multi-part safe	Works regardless of intent classification, since it runs post-generation

This pattern is particularly valuable for the ZOL use case where patients ask complex, multi-faceted questions that span navigational and structural concerns in a single query.

References

Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
Karpukhin, V., et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. EMNLP 2020. — Foundation for the dense bi-encoder retrieval lane that runs alongside the taxonomy resolution chain.
Sarmah, B., et al. (2024). HybridRAG: Integrating Knowledge Graphs and Vector Retrieval. — Formalises the hybrid KG + vector pattern this page documents. (Sarmah et al. 2024)
Soman, K., et al. (2024). OntologyRAG: Ontology-enhanced retrieval-augmented generation. (Soman et al. 2024)
SNOMED CT Terminology — terminology source for resolvers R7/R8/R9/R10.

Trade-offs​

Enrichment Pipeline Overview​

Step 1: Entity Extraction​

Step 2: Taxonomy Resolution (12-Resolver Chain)​

Resolver Descriptions​

SNOMED CT Integration​

Step 3: Query Enrichment​

Step 4: Sequential Retrieval​

4a. Vector Search (Enriched Query)​

4b. Taxonomy Search (Intent-Routed SQL)​

4c. Ontology Lookup​

Step 5: Taxonomy Injection Gate​

Stage 5c: Synthetic Department-Doctor-List Injection​

Interaction with post-answer enrichment​

Stage Execution Order​

Step 6: Routing Hint Injection​

Step 7: System Prompt Augmentation​

End-to-End Worked Example​

Configuration​

Key Implementation Files​

Post-Answer Taxonomy Enrichment​

How It Works​

Design Principles​

References​