Skip to main content

Storage Architecture

The ZOL Intelligent Search system employs polyglot persistence across PostgreSQL (with the pgvector extension), Redis, and MinIO — each optimised for a distinct access pattern. No single database technology excels at all of the workloads the system must support: vector similarity search at low latency, ACID-protected relational entity joins, ephemeral session and rate-limit state, and large blob storage for raw document content.

Storage-Level Trade-offs

DecisionChosenAlternatives consideredRejected because
Vector storePostgreSQL + pgvector (@pgvector_docs) — single ACID database for embeddings, taxonomy, and tenant metadataPinecone / Weaviate / Qdrant (managed); FAISS in-process (@johnson2017faiss)Managed vector DBs add a second operational system, separate access-control plane, and synchronisation surface against tenant + taxonomy data; FAISS is non-persistent and lacks ACID guarantees. pgvector keeps everything in one ACID database with one backup story.
ANN index typeHNSW (Hierarchical Navigable Small Worlds)IVFFlat (pgvector's other index type); exact nearest neighbourIVFFlat needs periodic reindexing as the corpus grows, which is operationally costly with continuous nightly ingest. Exact NN scales linearly per query and would saturate at corpus growth. HNSW handles insertions incrementally without reindex; the trade-off is higher memory at index build time, accepted at our 10 K-chunk scale. The ANN-vs-exact trade-off is the same one Faiss @johnson2017faiss characterises at billion-scale; pgvector's HNSW operator inherits that trade-off curve at PostgreSQL scale.
Knowledge graphPostgreSQL taxonomy tablesNeo4j Graph Data Science (@neo4j_gds_manual)Removed in 2026-05-02 (ADR-0053). The GDS algorithms (PageRank, community detection) were never load-bearing; the queries we actually used were SQL joins over typed entity tables. Removing Neo4j eliminated an operational service without losing any retrieval capability.
Cache backendPostgreSQL (app.semantic_query_cache)Redis with Redis Stack vector search; in-process LRUTier 2 of the cache requires HNSW similarity search on 1,536-dim embeddings, which Redis OSS does not provide. Redis Stack (with the search module) is not in our current Docker Compose stack. In-process LRU loses entries on every redeploy. PostgreSQL already has pgvector and an HNSW operator; an extra table is the simplest delta.
Object storeMinIO (on-premise S3-compatible)AWS S3 / Azure Blob / Cloudflare R2Hospital data residency requirements forbid sending document content to any cloud blob store. MinIO runs in our Docker Compose stack and presents an S3-compatible API, so the application code is portable to a future cloud move.

Storage Overview

Vector Store: PostgreSQL + pgvector

Why pgvector?

The vector store is the backbone of semantic search, storing the dense sentence embeddings that enable similarity-based retrieval (Reimers & Gurevych, 2019). By embedding pgvector into PostgreSQL rather than using a standalone vector database (Pinecone, Weaviate, Qdrant), the system gains several advantages:

  • Single database for both relational metadata and vector embeddings
  • ACID transactions protecting data integrity during ingestion
  • Mature ecosystem -- backup, monitoring, and tooling from decades of PostgreSQL investment
  • No vendor lock-in -- pgvector is an open-source extension

Schema Design

Column naming

The Python model maps the chunk_metadata attribute to the database column named metadata. They refer to the same JSONB column — chunk_metadata is the Python name, metadata is the SQL column name.

Page Summaries (Contextual Retrieval)

The chunk_metadata JSONB column stores page summaries generated during ingestion by the LLM Entity Validator (ADR-0014). These are 2-3 sentence Dutch descriptions of the source document, used for contextual retrieval — a technique that Anthropic's research shows reduces retrieval failure rates by 49% when combined with hybrid search.

At query time, the context assembly pipeline prepends the page summary to the first chunk from each document, giving the LLM critical document-level context that individual chunks lose during chunking. No database migration was required — the JSONB column accepts arbitrary keys.

HNSW Index

The system uses a Hierarchical Navigable Small World (HNSW) index for approximate nearest neighbor search. HNSW was selected over IVFFlat (the other pgvector index type) for a specific architectural reason:

CriterionHNSWIVFFlat
Insert performanceNo reindex neededRequires periodic reindexing
Query speedSlightly fasterFast after reindex
Memory usageHigherLower
Dynamic dataExcellentPoor without reindex

For ZOL, where content is continuously ingested (new brochures published, website updated), HNSW's ability to handle insertions without reindexing is a decisive advantage. IVFFlat would require scheduled reindexing jobs to maintain search quality, adding operational complexity.

Index Configuration

The HNSW index is configured with m=16 (connections per layer) and ef_construction=200 (construction quality). These values balance search accuracy against index build time. The current ZOL production corpus is ~10,400 chunks (verify via \d+ app.document_chunks on the pilot DB; the academic basis for HNSW's ANN trade-off is summarised in @johnson2017faiss — Faiss billion-scale similarity search).

Vector Dimensions

Each embedding is a 1,536-dimensional vector produced by OpenAI text-embedding-3-large (truncated from the model's native 3,072 dimensions to fit pgvector's HNSW 2,000-dim limit). See ADR-0048, @openai2024embeddings for the model announcement, and @pgvector_docs for the indexing layer.

Migration history:

  • nomic-embed-text (768-dim, Ollama) — initial deployment
  • BGE-M3 (1,024-dim, Ollama) — Feb 2026 (ADR-0033)
  • text-embedding-3-large (1,536-dim truncated, OpenAI) — current; Apr 2026 (ADR-0048)

The current model provides:

  • Stronger Dutch retrieval quality (MTEB-NL ~64.6 vs BGE-M3's 60.0)
  • Removal of the Ollama operational dependency for embeddings
  • Predictable latency under load via the OpenAI API
  • Cost of ~$0.13 per 1M tokens (75% prompt-cache discount; ~$0.20/month at 25,000 monthly queries) — accepted in ADR-0048 as the trade-off for the quality and operational gains

Entity Taxonomy: PostgreSQL

Relational Model

Entity relationships are stored in PostgreSQL taxonomy tables, replacing the earlier Neo4j knowledge graph. This consolidation keeps all structured data in a single database, simplifying operations and enabling transactional consistency with the vector store.

Table Structure

The taxonomy is modelled across multiple PostgreSQL tables, each scoped by tenant_id:

TableDescriptionExample Row
hospitalsTop-level organizationsZOL
campusesPhysical locationsSint-Jan, André Dumont
departmentsHospital departments with aliasesCardiologie (aliases: hartafdeling, hart- en vaatziekten)
conditionsMedical conditions with SNOMED codesHartfalen
treatmentsMedical proceduresKnieprothese, Chemotherapie
examinationsDiagnostic proceduresMRI, Bloedonderzoek
doctorsMedical professionalsDr. Van den Berg

Relationship Junction Tables

Relationships are stored in junction tables with foreign keys:

Junction TableFromToExtra Columns
department_campusesDepartmentCampus-
doctor_departmentsDoctorDepartmentrole, schedule, status, contacts
department_conditionsDepartmentCondition-
department_treatmentsDepartmentTreatment-
department_examinationsDepartmentExamination-
treatment_conditionsTreatmentCondition-
examination_conditionsExaminationCondition-

All junction tables carry tenant_id for multi-tenant isolation and use composite unique constraints to prevent duplicate relationships.

Tiered Query Strategy

The taxonomy employs a two-tier query strategy to maximize both precision and recall:

  • Tier 1 -- Typed Entities: SQL queries against typed entity tables, routed by LLM-extracted entities. Used for precise entity lookups ("Who is Dr. Van den Berg?") and semantic queries resolved through taxonomy aliases ("hartproblemen" -> Hartfalen -> Cardiologie). Highest confidence, fastest execution.
  • Tier 2 -- Vector Fallback: Falls back to pgvector semantic search when the taxonomy has no relevant entities. Ensures the system always returns results.
Migration from Neo4j

Neo4j was fully removed in 2026-05-02 (ADR-0053). All entity relationships now live in PostgreSQL taxonomy tables (taxonomy_entities, taxonomy_relationships). This eliminates a separate infrastructure dependency while preserving the same entity model and query semantics. The HospitalTaxonomy class provides the same API surface, with SQL queries replacing Cypher.

Operational & Analytics Tables

The taxonomy and document tables above are the system's primary semantic surface. A second group of tables — added between 2026-04 and 2026-05-09 — captures the operational state of the running system and feeds the Operations dashboard. All inherit the standard tenant_id FK isolation pattern (see Multi-Tenancy).

TableMigrationPurposeKey columnsSource
app.ingest_runs061Audit log for nightly auto-ingest. One row per scheduler invocation.mode, started_at, completed_at, urls_discovered, urls_indexed, urls_failed, failure_class, notesbackend/app/models/database.py:875
app.tenant_business_hours065Per-tenant operating-hours configuration for voice-channel "is the hospital open right now?" answers.tenant_id, weekday, open_time, close_time, closedbackend/app/models/database.py:1651
app.tenant_cost_models065Per-tenant LLM cost model (per-token rates by model tier) used by CostTracker.tenant_id, model_tier, prompt_token_rate, completion_token_rate, effective_frombackend/app/models/database.py:1678
app.daily_tenant_metrics065Daily aggregation feeding the Value Dashboard (queries served, cache hit rate, deflection ratio, $ saved).tenant_id, date, query_count, cache_hit_rate, deflection_count, dollars_savedbackend/app/models/database.py:1707
app.category_mismatch_telemetry066Per-turn telemetry for the Value Framework affinity rerank — what fraction of top-K chunks were OFF-category vs the intent's preferred set. Drives the CategoryMismatchTrend chart in the Operations / costs tab.tenant_id, intent_class, primary_category, mismatch_rate, chunks_total, chunks_off_category, query_previewmigration 066
app.diagnostic_feedback067Operator-rated verdicts (agree / partial / disagree) on v2 diagnostic-investigation outputs. Drives the DiagnosticAccuracyTrend chart. UNIQUE (investigation_id, operator_user_id) so each operator has one rating per investigation.investigation_id, tenant_id, operator_user_id, verdict, notesmigration 067

These tables are read by the /api/v1/admin/feedback/telemetry-stats, /api/v1/admin/ops/category-mismatch, and /api/v1/admin/ops/diagnostic-accuracy endpoints; see Feedback Dashboard Metrics for the rendering layer.

Semantic Query Cache: PostgreSQL + pgvector (ADR-0031)

The query cache is a two-tier system that stores previous query results in PostgreSQL, using the same pgvector infrastructure as the vector store. It operates on LLM-reformulated queries (not raw user input), which maximizes hit rates because the intent classifier normalizes language, spelling, and word order before caching.

Two-Tier Lookup

TierMechanismLatencyWhat It Catches
Tier 1SHA-256 hash of reformulated query text~1msIdentical reformulations — including cross-language queries that normalize to the same Dutch form
Tier 2Cosine similarity via HNSW index on 1,536-dim text-embedding-3-large embeddings~30msSemantically equivalent queries with different wording (e.g., "Welke artsen werken op orthopedie?" vs "Welke dokters zijn er op de afdeling orthopedie?")

Why Reformulated Queries?

The intent classifier already normalizes every user query to well-formed Dutch clinical language. This collapses many distinct inputs into identical or near-identical strings:

User InputLanguageReformulated Query
"rugpijn"Dutch"Welke afdelingen bij ZOL behandelen rugpijn?"
"back pain"English"Welke afdelingen bij ZOL behandelen rugpijn?"
"sırt ağrısı"Turkish"Welke afdelingen bij ZOL behandelen rugpijn?"

All three produce the same reformulated query → Tier 1 hash hit after the first request, regardless of language.

Schema

Indexes

IndexTypePurpose
uq_tenant_query_hashUNIQUE (tenant_id, query_hash)Tier 1 exact-match lookup + upsert conflict target
idx_cache_embeddingHNSW (vector_cosine_ops)Tier 2 approximate nearest neighbor search
idx_cache_expiresB-tree (expires_at)Efficient expired-row cleanup sweeps

Configuration

SettingDefaultDescription
query_cache_enabledtrueEnable/disable the entire cache
query_cache_ttl_seconds3600 (1 hour)Time-to-live for cache entries
query_cache_similarity_threshold0.97Minimum cosine similarity for Tier 2 hits (range: 0.80-1.00)

All settings are configurable at runtime via the admin Settings UI and persist per-tenant.

Why PostgreSQL Instead of Redis?

CriterionPostgreSQL + pgvectorRedis
Vector similarity searchNative HNSW supportRequires Redis Stack (not deployed)
PersistenceSurvives restartsVolatile without AOF/RDB
Multi-tenant queriesSQL WHERE clausesManual key namespacing
Already deployedYes (vector store)Yes (sessions only)
ACID guaranteesFull transactionsEventual consistency

The decisive factor is Tier 2: approximate nearest neighbor search on embeddings requires pgvector's HNSW index. Since PostgreSQL is already the primary data store, adding a table is simpler than deploying Redis Stack.

Cost Optimization

A cache hit avoids all LLM API calls (intent classification, query rewriting, response generation, quality evaluation), which collectively cost approximately $0.003 per query. With 25,000 monthly queries and an estimated 40-60% cache hit rate, caching saves approximately $30-45/month in API costs, plus reducing latency from ~5.5s to < 50ms.

Session & Rate Limiting: Redis

Redis continues to serve three ephemeral caching roles:

RoleTTLPurpose
Session storage30 minutesMaintain conversation context across requests
Rate limit counters1 minuteSliding window rate limiting
JWT token blacklist24 hoursInvalidate tokens on logout

Object Storage: MinIO

MinIO stores the canonical source content -- the Markdown extracted from documents and the raw HTML crawled from the website. This serves two purposes:

  1. Reprocessing: When the chunking strategy or embedding model changes, content can be re-processed from MinIO without re-crawling the website or re-extracting PDFs.
  2. Audit trail: The raw content provides a verifiable reference for what the system ingested, supporting content governance requirements.

MinIO was chosen over cloud object storage (S3, Azure Blob) to keep all data within the on-premise infrastructure, aligning with the hospital's data residency requirements.