Skip to main content
Architectural Update (March 2026)

This ADR was written when the system used Neo4j for entity storage. As of March 2026, Neo4j has been fully removed and replaced by PostgreSQL taxonomy tables (taxonomy_entities, taxonomy_relationships). The decision rationale documented here remains valid; the storage layer has changed.

ADR-0016: SNOMED CT Terminology Integration

Date: 2026-02-09 (proposed), 2026-02-21 (Phase 1 implemented), 2026-02-22 (Phase 2 designed) | Status: Accepted (Phase 1 implemented, Phase 2 approved design)

Context

The hand-maintained taxonomy (zol_taxonomy.py, see ADR-0015) contains ~50 medical terminology entries alongside ~500+ entries of ZOL-specific institutional knowledge. Every new page ingested surfaces terms not yet mapped — a brittleness pattern that will not scale to the full corpus (1,443 URLs) or to additional hospital tenants.

The Scalability Gap

What We HaveWhat SNOMED CT Offers
~13 condition aliases~280,000 Dutch-validated concepts
~4 treatment aliases~580,000 total term descriptions
~12 examination normalizationsBuilt-in synonym system per concept
Manual iterative patchingBi-annual updates from Belgian Terminology Centre

Two Types of Taxonomy Data

Type A (medical terminology) is replaceable by SNOMED CT. Type B (institutional knowledge) is ZOL-specific and stays hand-maintained.

Decision

Adopt SNOMED CT Belgian Edition as the medical terminology backbone in three phases.

Why SNOMED CT?

  • Free in Belgium — Belgium is a SNOMED International member since 2013
  • 280,000 concepts with validated Dutch descriptions (580K total terms)
  • Belgian mandate: SNOMED CT required for primary diagnoses by 2027, full EHDS compliance by 2029
  • Synonyms built-in: each concept has Preferred Term + Acceptable Synonyms per language
  • Hierarchy: Borstkanker IS_A Kanker IS_A Clinical Finding enables "find all subtypes" queries
  • Cross-references: official maps to ICD-10, LOINC, ATC

Phase 1: SNOMED CT Reference Tables + Query-Time Synonym Expansion (IMPLEMENTED)

Status: Complete (2026-02-21) | Infrastructure: PostgreSQL tables (7M rows)

Instead of enriching static taxonomy files, we imported the full SNOMED CT Belgian Edition into PostgreSQL:

  • RF2 parser (snomed_loader.py): Parses SNOMED CT RF2 concept, description, and relationship files
  • PostgreSQL tables: 356K concepts, 656K descriptions, 1.2M relationships, 4.7M transitive closure
  • SnomedTerminologyService: BMQExpander pattern — synonym expansion via IS-A hierarchy
  • Query pipeline: resolve_search_query_with_snomed() falls through to SNOMED when static taxonomy fails
  • FINDING_SITE routing: Condition → body structure → department mapping (active in query pipeline via SnomedGraphEnricher.resolve_department_from_term())

Evaluation (15 SNOMED-gap golden questions):

  • Baseline: 6/15 (40.0%) → SNOMED ON: 7/15 (46.7%) stable, 9/15 (60%) best case
  • Stable gain: cataract → Oftalmologie via synonym expansion

Phase 2: SNOMED Graph Enrichment at Seeding Time (APPROVED DESIGN)

Complexity: M (3-4 hours) | Infrastructure: None (uses existing PostgreSQL SNOMED tables)

SNOMED CT becomes Source 2 in the Three-Source Knowledge Architecture — a first-class knowledge source at graph seeding time, not just a query-time fallback. This phase addresses the root cause of poor SNOMED golden question performance (4/15 pass rate, 26.7%): conditions exist as graph nodes but lack HANDLES relationships to the correct departments.

What it adds to the seeding pipeline:

EnrichmentMechanismImpact
Concept IDs on nodesMatch entity names against snomed_descriptionsLanguage-independent identity
Dutch synonyms as propertiesFetch all Dutch descriptions for matched conceptReplaces ~250 hand-maintained aliases
IS_A hierarchysnomed_transitive_closure between existing nodesHierarchical queries (diabetes → subtypes)
FINDING_SITE → HANDLESCondition → body structure → departmentAuto-creates missing condition→department links
PROCEDURE_SITE → OFFERSTreatment → body structure → departmentAuto-creates missing treatment→department links

Key design decisions:

  • Closed entity set: Only the ~260 entities already in the graph are enriched. The 356K SNOMED concepts are NOT imported into Neo4j.
  • Confidence scoring: SNOMED-derived relationships get confidence 0.7 (vs 1.0 for scraped/curated).
  • Plausibility guards: Existing _is_plausible_handles() and negative maps filter SNOMED relationships equally.
  • Graceful degradation: If snomed_concepts table is empty, enrichment is skipped.

Target: SNOMED golden question pass rate from 4/15 (26.7%) to 11–13/15 (73–87%).

Phase 3: MedCAT for NER (Future)

Only if Phases 1-2 leave coverage gaps:

  • MedCAT + dutch-medical-concepts (UMC Utrecht): 750K+ Dutch terms
  • Context-aware NER with SNOMED entity linking
  • Replaces regex extraction entirely

Multi-Tenant Benefit

DataSourcePer-Tenant?
Medical terminology (conditions, treatments, synonyms)SNOMED CTUniversal
Institutional knowledge (campuses, dept-condition maps)\{tenant\}_taxonomy.pyPer-tenant

New hospital = new taxonomy file for institutional knowledge + same SNOMED CT for medical terminology.

Consequences

Positive

  • 580K Dutch terms replace ~50 manual aliases — eliminates the iterative patching problem
  • Aligns with Belgian 2027 SNOMED CT mandate
  • Multi-tenant ready: medical terminology shared across hospitals
  • SNOMED concept IDs enable interoperability with other Belgian healthcare systems
  • Hierarchical search expansion: "find all subtypes of kanker" in one ECL query

Negative

  • MLDS registration required (free but 1-2 weeks for approval)
  • SNOMED CT learning curve (concept model, RF2 format)
  • ZOL-specific center names ("Hartcentrum", "Borstcentrum") not in SNOMED CT
  • SNOMED-derived relationships may introduce false positives (mitigated by confidence scoring + plausibility guards)

Key Resources

ResourceURL
MLDS Registrationhttps://mlds.ihtsdotools.org/
SNOMED CT Browserhttps://browser.ihtsdotools.org/
Belgian Terminology Portalhttps://apps.health.belgium.be/terminology-portal/
Snowstorm Litehttps://github.com/IHTSDO/snowstorm-lite
dutch-medical-conceptshttps://github.com/umcu/dutch-medical-concepts
MedCAThttps://github.com/CogStack/MedCAT

References

  • Hartendorp, R., et al. (2024). Biomedical entity linking for Dutch. In Proceedings of CL4Health Workshop, LREC-COLING 2024.
  • Jimeno-Yepes, A., Berlanga, R., & Rebholz-Schuhmann, D. (2012). Ontology-based query expansion for biomedical information retrieval. BMC Bioinformatics, 13(S14). https://doi.org/10.1186/1471-2105-13-S14-S1
  • Ruch, P., et al. (2006). Using SNOMED CT body structure hierarchy for concept-based information retrieval. Proceedings of AMIA Annual Symposium, 674--678.
  • Soman, K., et al. (2024). OntologyRAG: Ontology-enhanced retrieval-augmented generation. arXiv preprint, arXiv:2412.09050.
  • Tiro.Health. (2025). SNOMED CT: Complete guide for medical documentation in Belgium. https://tiro.health/