This ADR was written when the system used Neo4j for entity storage. As of March 2026, Neo4j has been fully removed and replaced by PostgreSQL taxonomy tables (taxonomy_entities, taxonomy_relationships). The decision rationale documented here remains valid; the storage layer has changed.
ADR-0016: SNOMED CT Terminology Integration
Date: 2026-02-09 (proposed), 2026-02-21 (Phase 1 implemented), 2026-02-22 (Phase 2 designed) | Status: Accepted (Phase 1 implemented, Phase 2 approved design)
Context
The hand-maintained taxonomy (zol_taxonomy.py, see ADR-0015) contains ~50 medical terminology entries alongside ~500+ entries of ZOL-specific institutional knowledge. Every new page ingested surfaces terms not yet mapped — a brittleness pattern that will not scale to the full corpus (1,443 URLs) or to additional hospital tenants.
The Scalability Gap
| What We Have | What SNOMED CT Offers |
|---|---|
| ~13 condition aliases | ~280,000 Dutch-validated concepts |
| ~4 treatment aliases | ~580,000 total term descriptions |
| ~12 examination normalizations | Built-in synonym system per concept |
| Manual iterative patching | Bi-annual updates from Belgian Terminology Centre |
Two Types of Taxonomy Data
Type A (medical terminology) is replaceable by SNOMED CT. Type B (institutional knowledge) is ZOL-specific and stays hand-maintained.
Decision
Adopt SNOMED CT Belgian Edition as the medical terminology backbone in three phases.
Why SNOMED CT?
- Free in Belgium — Belgium is a SNOMED International member since 2013
- 280,000 concepts with validated Dutch descriptions (580K total terms)
- Belgian mandate: SNOMED CT required for primary diagnoses by 2027, full EHDS compliance by 2029
- Synonyms built-in: each concept has Preferred Term + Acceptable Synonyms per language
- Hierarchy:
Borstkanker IS_A Kanker IS_A Clinical Findingenables "find all subtypes" queries - Cross-references: official maps to ICD-10, LOINC, ATC
Phase 1: SNOMED CT Reference Tables + Query-Time Synonym Expansion (IMPLEMENTED)
Status: Complete (2026-02-21) | Infrastructure: PostgreSQL tables (7M rows)
Instead of enriching static taxonomy files, we imported the full SNOMED CT Belgian Edition into PostgreSQL:
- RF2 parser (
snomed_loader.py): Parses SNOMED CT RF2 concept, description, and relationship files - PostgreSQL tables: 356K concepts, 656K descriptions, 1.2M relationships, 4.7M transitive closure
- SnomedTerminologyService: BMQExpander pattern — synonym expansion via IS-A hierarchy
- Query pipeline:
resolve_search_query_with_snomed()falls through to SNOMED when static taxonomy fails - FINDING_SITE routing: Condition → body structure → department mapping (active in query pipeline via
SnomedGraphEnricher.resolve_department_from_term())
Evaluation (15 SNOMED-gap golden questions):
- Baseline: 6/15 (40.0%) → SNOMED ON: 7/15 (46.7%) stable, 9/15 (60%) best case
- Stable gain: cataract → Oftalmologie via synonym expansion
Phase 2: SNOMED Graph Enrichment at Seeding Time (APPROVED DESIGN)
Complexity: M (3-4 hours) | Infrastructure: None (uses existing PostgreSQL SNOMED tables)
SNOMED CT becomes Source 2 in the Three-Source Knowledge Architecture — a first-class knowledge source at graph seeding time, not just a query-time fallback. This phase addresses the root cause of poor SNOMED golden question performance (4/15 pass rate, 26.7%): conditions exist as graph nodes but lack HANDLES relationships to the correct departments.
What it adds to the seeding pipeline:
| Enrichment | Mechanism | Impact |
|---|---|---|
| Concept IDs on nodes | Match entity names against snomed_descriptions | Language-independent identity |
| Dutch synonyms as properties | Fetch all Dutch descriptions for matched concept | Replaces ~250 hand-maintained aliases |
| IS_A hierarchy | snomed_transitive_closure between existing nodes | Hierarchical queries (diabetes → subtypes) |
| FINDING_SITE → HANDLES | Condition → body structure → department | Auto-creates missing condition→department links |
| PROCEDURE_SITE → OFFERS | Treatment → body structure → department | Auto-creates missing treatment→department links |
Key design decisions:
- Closed entity set: Only the ~260 entities already in the graph are enriched. The 356K SNOMED concepts are NOT imported into Neo4j.
- Confidence scoring: SNOMED-derived relationships get confidence 0.7 (vs 1.0 for scraped/curated).
- Plausibility guards: Existing
_is_plausible_handles()and negative maps filter SNOMED relationships equally. - Graceful degradation: If
snomed_conceptstable is empty, enrichment is skipped.
Target: SNOMED golden question pass rate from 4/15 (26.7%) to 11–13/15 (73–87%).
Phase 3: MedCAT for NER (Future)
Only if Phases 1-2 leave coverage gaps:
- MedCAT + dutch-medical-concepts (UMC Utrecht): 750K+ Dutch terms
- Context-aware NER with SNOMED entity linking
- Replaces regex extraction entirely
Multi-Tenant Benefit
| Data | Source | Per-Tenant? |
|---|---|---|
| Medical terminology (conditions, treatments, synonyms) | SNOMED CT | Universal |
| Institutional knowledge (campuses, dept-condition maps) | \{tenant\}_taxonomy.py | Per-tenant |
New hospital = new taxonomy file for institutional knowledge + same SNOMED CT for medical terminology.
Consequences
Positive
- 580K Dutch terms replace ~50 manual aliases — eliminates the iterative patching problem
- Aligns with Belgian 2027 SNOMED CT mandate
- Multi-tenant ready: medical terminology shared across hospitals
- SNOMED concept IDs enable interoperability with other Belgian healthcare systems
- Hierarchical search expansion: "find all subtypes of kanker" in one ECL query
Negative
- MLDS registration required (free but 1-2 weeks for approval)
- SNOMED CT learning curve (concept model, RF2 format)
- ZOL-specific center names ("Hartcentrum", "Borstcentrum") not in SNOMED CT
- SNOMED-derived relationships may introduce false positives (mitigated by confidence scoring + plausibility guards)
Key Resources
| Resource | URL |
|---|---|
| MLDS Registration | https://mlds.ihtsdotools.org/ |
| SNOMED CT Browser | https://browser.ihtsdotools.org/ |
| Belgian Terminology Portal | https://apps.health.belgium.be/terminology-portal/ |
| Snowstorm Lite | https://github.com/IHTSDO/snowstorm-lite |
| dutch-medical-concepts | https://github.com/umcu/dutch-medical-concepts |
| MedCAT | https://github.com/CogStack/MedCAT |
References
- Hartendorp, R., et al. (2024). Biomedical entity linking for Dutch. In Proceedings of CL4Health Workshop, LREC-COLING 2024.
- Jimeno-Yepes, A., Berlanga, R., & Rebholz-Schuhmann, D. (2012). Ontology-based query expansion for biomedical information retrieval. BMC Bioinformatics, 13(S14). https://doi.org/10.1186/1471-2105-13-S14-S1
- Ruch, P., et al. (2006). Using SNOMED CT body structure hierarchy for concept-based information retrieval. Proceedings of AMIA Annual Symposium, 674--678.
- Soman, K., et al. (2024). OntologyRAG: Ontology-enhanced retrieval-augmented generation. arXiv preprint, arXiv:2412.09050.
- Tiro.Health. (2025). SNOMED CT: Complete guide for medical documentation in Belgium. https://tiro.health/
Related ADRs
- ADR-0015: Taxonomy-Driven Entity Normalization
- ADR-0014: LLM Entity Validation
- Medical Knowledge Architecture: Three-Source Knowledge Architecture design
- Seeding Pipeline: Phase 3 SNOMED enrichment details