Skip to main content

Multi-Tenancy Roadmap

Current Status

The system has completed Phase 0: Platform Decoupling and Phases 1-4: Hospital-Agnostic Refactoring (config extraction, prompt parameterization, generic naming, DB-driven config cache). It runs in single-tenant pilot mode for ZOL. Full multi-tenant routing (subdomain resolution, per-tenant auth) is planned. See also the Multi-Tenancy Architecture page for the current implementation details.

1. Vision

The ZOL Intelligent Search system was initially built as a single-hospital solution. To enable deployment across multiple hospitals (SaaS model), the architecture must support tenant isolation — ensuring that each hospital's data, configuration, and user experience are completely separate.

The decoupling follows a phased approach: first parameterize all hospital-specific references (Phase 0, completed), then make the codebase fully hospital-agnostic (Phases 1-4, completed March 31), then add tenant routing and management (next phase, planned), and finally implement full multi-tenant operations (future).

2. Phase 0: Platform Decoupling (Completed)

Phase 0 converted all hardcoded ZOL-specific references into parameterized, configuration-driven code. This ensures the codebase can serve any hospital by changing configuration rather than code.

2.1 What Was Done

ComponentBeforeAfter
Taxonomy (zol_taxonomy.py)37 module-level constants loaded at importHospitalTaxonomy class with get_taxonomy(hospital_id) factory
Prompt templates (prompts.py)70+ hardcoded "ZOL" referencesPromptContext dataclass with hospital identity placeholders
Taxonomy tablesNo tenant scopingtenant_id on all entities and relationships; composite unique constraints
Redis keysFlat key prefixes{tenant_id}: prefix on all cache, rate-limit, and session keys
Site configurationZOL_CONFIG singletonget_site_config(hospital_id) with per-hospital profiles
Hub page detectionHardcoded page type patternsAutomatic hub/detail classification via LLM binary classifier
Document serviceZOL-specific title patternsPatterns loaded from HospitalConfig
Query serviceHardcoded base URLsURLs from hospital configuration
RAG serviceZOL identity in responsesUses PromptContext.from_hospital_config()
Taxonomy registryGlobal singletonPer-hospital registry cache
Frozen taxonomySingle global registryget_frozen_taxonomy_registry(hospital_id)

2.2 Configuration Architecture

All hospital-specific data lives in YAML configuration files:

# backend/app/services/graph/hospital_config/zol.yaml
hospital:
name: Ziekenhuis Oost-Limburg
short_name: ZOL
website: https://www.zol.be
phone: 089/80 80 80

campuses:
- id: zol-campus-sint-jan
canonical_name: ZOL Genk, campus Sint-Jan
aliases: [sint-jan, sint jan, genk, campus sint-jan, zol genk]
address: Synaps Park 1
city: Genk
postal_code: '3600'
phone: 089/80 80 80
# ... more campuses

# Departments and golden page URLs removed (2026-03-09).
# Departments are now auto-discovered by the extraction pipeline.
# Hub/detail classification replaces golden_page_patterns and golden_page_types.

domain_knowledge:
dept_conditions:
slaapcentrum: [slaapapneu, slaapstoornis, insomnie, ...]
# Hospital-specific centers only — standard departments are in
# medical_knowledge/department_conditions.py (universal).
dept_treatments:
cardiologie: [pacemaker, ablatie, bypass, cardioversie, ...]
# ... more domain knowledge mappings

search_aliases:
universal: {}
hospital:
borstkanker: Borstcentrum
ivf: Fertiliteitscentrum
slaapkliniek: Slaapcentrum
# ... more hospital-specific aliases

specialty_department_map:
cardiologie: [cardiologie, hartcentrum]
orthopedie: [orthopedie, orthopedische chirurgie]
# ... specialty-to-department mappings

Adding a new hospital requires only a new YAML file — no code changes. Departments and page classification are handled automatically by the extraction pipeline and LLM binary classifier.

2.3 Backward Compatibility

A compatibility shim (zol_taxonomy.py) re-exports all symbols from the new hospital_taxonomy.py module, ensuring existing imports continue to work during the migration period. This shim can be removed once all imports are updated.

3. Current State: Single-Tenant Pilot

The pilot deployment serves one hospital (ZOL) with the default tenant ID. All components use hospital_id="zol" as the default parameter, making the system fully functional without explicit tenant routing.

3.1 What Works Today

  • Full RAG pipeline with hospital-parameterized prompts
  • Taxonomy tables with tenant-scoped entities and relationships
  • Tenant-isolated Redis caching and rate limiting
  • Hospital-specific safety messages and disclaimers in 8 languages
  • Configurable taxonomy with SNOMED-CT synonym enrichment

3.2 What Remains Single-Tenant

  • PostgreSQL tables (conversations, users, analytics) lack tenant_id columns
  • Frontend has no tenant routing or hospital selection
  • Authentication is not yet tenant-aware (Keycloak realms can provide per-tenant isolation in Phase 1)
  • No tenant management API
  • No per-tenant billing or usage tracking

4. Hospital-Agnostic Refactoring (Completed March 31)

Before multi-tenant routing, the codebase was made fully hospital-agnostic in a 4-phase sprint:

PhaseWhatStatus
Phase 1Config extraction — site_crawl_configs table, admin APIDone
Phase 2Prompt parameterization — all LLM prompts use PromptContext from DBDone
Phase 3Generic naming — ZOLCrawlerHospitalCrawler, ZOL branding removedDone
Phase 4DB-driven config cache — SiteConfigCache replaces all in-code constantsDone

Result: 259 ZOL-specific references removed. A new hospital can be onboarded with DB configuration only. See Release Notes: March 28-31 for details.

5. Tenant Routing (Planned)

The next phase adds the infrastructure to serve multiple hospitals from a single deployment.

4.1 Tenant Resolution

Tenant resolution options (ordered by complexity):

  1. Subdomain-based: {hospital}.search.example.com — cleanest for end users
  2. Path-based: search.example.com/{hospital}/ — simpler infrastructure
  3. Header-based: X-Tenant-Id header — for API consumers

4.2 Database Tenancy

ApproachIsolationComplexityRecommended?
Shared schema, tenant_id columnRow-levelLowYes (for pilot scale)
Separate schemas per tenantSchema-levelMediumFuture option
Separate databases per tenantFullHighNot needed

The shared-schema approach adds a tenant_id column to all PostgreSQL tables and enforces row-level filtering through a middleware or repository pattern. This is sufficient for the expected scale (5-10 hospitals).

4.3 Estimated Scope

TaskEffortDependencies
Add tenant_id to PostgreSQL tables (migration)MediumNone
Tenant resolver middlewareSmallNone
Per-tenant authenticationMediumTenant resolver
Frontend tenant contextSmallTenant resolver
Tenant management APIMediumDatabase migration
Per-tenant YAML configurationSmallPhase 0 (done)

6. Full Multi-Tenant Operations (Future)

Phase 2 adds operational capabilities for managing multiple hospitals in production.

5.1 Features

  • Tenant onboarding workflow: Automated setup of new hospital (YAML config, database seeding, taxonomy initialization, content crawl)
  • Per-tenant analytics dashboard: Hospital administrators see only their own data
  • Content isolation verification: Automated tests confirming no cross-tenant data leakage
  • Per-tenant feature flags: Enable/disable pipeline components per hospital
  • Usage metering and billing: Track LLM API costs, storage, and query volumes per tenant

5.2 Content Pipeline

Each hospital requires its own content pipeline:

7. Storage Isolation Summary

Storage LayerPhase 0 (Current)Phase 1 (Planned)Phase 2 (Future)
PostgreSQLSingle tenantRow-level tenant_idRow-level (sufficient)
Taxonomy tablestenant_id on all rowsSameSame
Redis{tenant_id}: key prefixSameSame
MinIO{tenant_id}/{doc_id} pathsSameSame
pgvectorSingle collectionPer-tenant collection or filterPer-tenant collection

8. Risk Considerations

RiskMitigation
Cross-tenant data leakageAutomated integration tests verifying isolation at every storage layer
Configuration driftYAML validation schema, CI/CD checks for required fields
Performance at scalePer-tenant caching, connection pooling, lazy taxonomy loading
Compliance variationPer-tenant DPIA and data retention settings (stored in tenant config)

Document version: 2.0 | Date: 2026-03-31 | Author: SOFT4U BV