Storage Architecture

The ZOL Intelligent Search system employs polyglot persistence across PostgreSQL (with the pgvector extension), Redis, and MinIO — each optimised for a distinct access pattern. No single database technology excels at all of the workloads the system must support: vector similarity search at low latency, ACID-protected relational entity joins, ephemeral session and rate-limit state, and large blob storage for raw document content.

Storage-Level Trade-offs

Decision	Chosen	Alternatives considered	Rejected because
Vector store	PostgreSQL + pgvector (@pgvector_docs) — single ACID database for embeddings, taxonomy, and tenant metadata	Pinecone / Weaviate / Qdrant (managed); FAISS in-process (@johnson2017faiss)	Managed vector DBs add a second operational system, separate access-control plane, and synchronisation surface against tenant + taxonomy data; FAISS is non-persistent and lacks ACID guarantees. pgvector keeps everything in one ACID database with one backup story.
ANN index type	HNSW (Hierarchical Navigable Small Worlds)	IVFFlat (pgvector's other index type); exact nearest neighbour	IVFFlat needs periodic reindexing as the corpus grows, which is operationally costly with continuous nightly ingest. Exact NN scales linearly per query and would saturate at corpus growth. HNSW handles insertions incrementally without reindex; the trade-off is higher memory at index build time, accepted at our 10 K-chunk scale. The ANN-vs-exact trade-off is the same one Faiss @johnson2017faiss characterises at billion-scale; pgvector's HNSW operator inherits that trade-off curve at PostgreSQL scale.
Knowledge graph	PostgreSQL taxonomy tables	Neo4j Graph Data Science (@neo4j_gds_manual)	Removed in 2026-05-02 (ADR-0053). The GDS algorithms (PageRank, community detection) were never load-bearing; the queries we actually used were SQL joins over typed entity tables. Removing Neo4j eliminated an operational service without losing any retrieval capability.
Cache backend	PostgreSQL (`app.semantic_query_cache`)	Redis with Redis Stack vector search; in-process LRU	Tier 2 of the cache requires HNSW similarity search on 1,536-dim embeddings, which Redis OSS does not provide. Redis Stack (with the search module) is not in our current Docker Compose stack. In-process LRU loses entries on every redeploy. PostgreSQL already has pgvector and an HNSW operator; an extra table is the simplest delta.
Object store	MinIO (on-premise S3-compatible)	AWS S3 / Azure Blob / Cloudflare R2	Hospital data residency requirements forbid sending document content to any cloud blob store. MinIO runs in our Docker Compose stack and presents an S3-compatible API, so the application code is portable to a future cloud move.

Storage Overview

Vector Store: PostgreSQL + pgvector

Why pgvector?

The vector store is the backbone of semantic search, storing the dense sentence embeddings that enable similarity-based retrieval (Reimers & Gurevych, 2019). By embedding pgvector into PostgreSQL rather than using a standalone vector database (Pinecone, Weaviate, Qdrant), the system gains several advantages:

Single database for both relational metadata and vector embeddings
ACID transactions protecting data integrity during ingestion
Mature ecosystem -- backup, monitoring, and tooling from decades of PostgreSQL investment
No vendor lock-in -- pgvector is an open-source extension

Schema Design

Column naming

The Python model maps the chunk_metadata attribute to the database column named metadata. They refer to the same JSONB column — chunk_metadata is the Python name, metadata is the SQL column name.

Page Summaries (Contextual Retrieval)

The chunk_metadata JSONB column stores page summaries generated during ingestion by the LLM Entity Validator (ADR-0014). These are 2-3 sentence Dutch descriptions of the source document, used for contextual retrieval — a technique that Anthropic's research shows reduces retrieval failure rates by 49% when combined with hybrid search.

At query time, the context assembly pipeline prepends the page summary to the first chunk from each document, giving the LLM critical document-level context that individual chunks lose during chunking. No database migration was required — the JSONB column accepts arbitrary keys.

HNSW Index

The system uses a Hierarchical Navigable Small World (HNSW) index for approximate nearest neighbor search. HNSW was selected over IVFFlat (the other pgvector index type) for a specific architectural reason:

Criterion	HNSW	IVFFlat
Insert performance	No reindex needed	Requires periodic reindexing
Query speed	Slightly faster	Fast after reindex
Memory usage	Higher	Lower
Dynamic data	Excellent	Poor without reindex

For ZOL, where content is continuously ingested (new brochures published, website updated), HNSW's ability to handle insertions without reindexing is a decisive advantage. IVFFlat would require scheduled reindexing jobs to maintain search quality, adding operational complexity.

Index Configuration

The HNSW index is configured with m=16 (connections per layer) and ef_construction=200 (construction quality). These values balance search accuracy against index build time. The current ZOL production corpus is ~10,400 chunks (verify via \d+ app.document_chunks on the pilot DB; the academic basis for HNSW's ANN trade-off is summarised in @johnson2017faiss — Faiss billion-scale similarity search).

Vector Dimensions

Each embedding is a 1,536-dimensional vector produced by OpenAI text-embedding-3-large (truncated from the model's native 3,072 dimensions to fit pgvector's HNSW 2,000-dim limit). See ADR-0048, @openai2024embeddings for the model announcement, and @pgvector_docs for the indexing layer.

Migration history:

nomic-embed-text (768-dim, Ollama) — initial deployment
BGE-M3 (1,024-dim, Ollama) — Feb 2026 (ADR-0033)
text-embedding-3-large (1,536-dim truncated, OpenAI) — current; Apr 2026 (ADR-0048)

The current model provides:

Stronger Dutch retrieval quality (MTEB-NL ~64.6 vs BGE-M3's 60.0)
Removal of the Ollama operational dependency for embeddings
Predictable latency under load via the OpenAI API
Cost of ~$0.13 per 1M tokens (75% prompt-cache discount; ~$0.20/month at 25,000 monthly queries) — accepted in ADR-0048 as the trade-off for the quality and operational gains

Entity Taxonomy: PostgreSQL

Relational Model

Entity relationships are stored in PostgreSQL taxonomy tables, replacing the earlier Neo4j knowledge graph. This consolidation keeps all structured data in a single database, simplifying operations and enabling transactional consistency with the vector store.

Table Structure

The taxonomy is modelled across multiple PostgreSQL tables, each scoped by tenant_id:

Table	Description	Example Row
`hospitals`	Top-level organizations	ZOL
`campuses`	Physical locations	Sint-Jan, André Dumont
`departments`	Hospital departments with aliases	Cardiologie (aliases: hartafdeling, hart- en vaatziekten)
`conditions`	Medical conditions with SNOMED codes	Hartfalen
`treatments`	Medical procedures	Knieprothese, Chemotherapie
`examinations`	Diagnostic procedures	MRI, Bloedonderzoek
`doctors`	Medical professionals	Dr. Van den Berg

Relationship Junction Tables

Relationships are stored in junction tables with foreign keys:

Junction Table	From	To	Extra Columns
`department_campuses`	Department	Campus	-
`doctor_departments`	Doctor	Department	role, schedule, status, contacts
`department_conditions`	Department	Condition	-
`department_treatments`	Department	Treatment	-
`department_examinations`	Department	Examination	-
`treatment_conditions`	Treatment	Condition	-
`examination_conditions`	Examination	Condition	-

All junction tables carry tenant_id for multi-tenant isolation and use composite unique constraints to prevent duplicate relationships.

Tiered Query Strategy

The taxonomy employs a two-tier query strategy to maximize both precision and recall:

Tier 1 -- Typed Entities: SQL queries against typed entity tables, routed by LLM-extracted entities. Used for precise entity lookups ("Who is Dr. Van den Berg?") and semantic queries resolved through taxonomy aliases ("hartproblemen" -> Hartfalen -> Cardiologie). Highest confidence, fastest execution.
Tier 2 -- Vector Fallback: Falls back to pgvector semantic search when the taxonomy has no relevant entities. Ensures the system always returns results.

Migration from Neo4j

Neo4j was fully removed in 2026-05-02 (ADR-0053). All entity relationships now live in PostgreSQL taxonomy tables (taxonomy_entities, taxonomy_relationships). This eliminates a separate infrastructure dependency while preserving the same entity model and query semantics. The HospitalTaxonomy class provides the same API surface, with SQL queries replacing Cypher.

Operational & Analytics Tables

The taxonomy and document tables above are the system's primary semantic surface. A second group of tables — added between 2026-04 and 2026-05-09 — captures the operational state of the running system and feeds the Operations dashboard. All inherit the standard tenant_id FK isolation pattern (see Multi-Tenancy).

Table	Migration	Purpose	Key columns	Source
`app.ingest_runs`	061	Audit log for nightly auto-ingest. One row per scheduler invocation.	`mode`, `started_at`, `completed_at`, `urls_discovered`, `urls_indexed`, `urls_failed`, `failure_class`, `notes`	`backend/app/models/database.py:1006`
`app.tenant_business_hours`	065	Per-tenant operating-hours configuration for voice-channel "is the hospital open right now?" answers.	`tenant_id`, `weekday`, `open_time`, `close_time`, `closed`	`backend/app/models/database.py:1651`
`app.tenant_cost_models`	065	Per-tenant LLM cost model (per-token rates by model tier) used by `CostTracker`.	`tenant_id`, `model_tier`, `prompt_token_rate`, `completion_token_rate`, `effective_from`	`backend/app/models/database.py:1678`
`app.daily_tenant_metrics`	065	Daily aggregation feeding the Value Dashboard (queries served, cache hit rate, deflection ratio, $ saved).	`tenant_id`, `date`, `query_count`, `cache_hit_rate`, `deflection_count`, `dollars_saved`	`backend/app/models/database.py:1707`
`app.category_mismatch_telemetry`	066	Per-turn telemetry for the Value Framework affinity rerank — what fraction of top-K chunks were OFF-category vs the intent's preferred set. Drives the `CategoryMismatchTrend` chart in the Operations / costs tab.	`tenant_id`, `intent_class`, `primary_category`, `mismatch_rate`, `chunks_total`, `chunks_off_category`, `query_preview`	migration 066
`app.diagnostic_feedback`	067	Operator-rated verdicts (`agree` / `partial` / `disagree`) on v2 diagnostic-investigation outputs. Drives the `DiagnosticAccuracyTrend` chart. UNIQUE `(investigation_id, operator_user_id)` so each operator has one rating per investigation.	`investigation_id`, `tenant_id`, `operator_user_id`, `verdict`, `notes`	migration 067

These tables are read by the /api/v1/admin/feedback/telemetry-stats, /api/v1/admin/ops/category-mismatch, and /api/v1/admin/ops/diagnostic-accuracy endpoints; see Feedback Dashboard Metrics for the rendering layer.

Semantic Query Cache: PostgreSQL + pgvector (ADR-0031)

The query cache is a two-tier system that stores previous query results in PostgreSQL, using the same pgvector infrastructure as the vector store. It operates on LLM-reformulated queries (not raw user input), which maximizes hit rates because the intent classifier normalizes language, spelling, and word order before caching.

Two-Tier Lookup

Tier	Mechanism	Latency	What It Catches
Tier 1	SHA-256 hash of reformulated query text	~1ms	Identical reformulations — including cross-language queries that normalize to the same Dutch form
Tier 2	Cosine similarity via HNSW index on 1,536-dim `text-embedding-3-large` embeddings	~30ms	Semantically equivalent queries with different wording (e.g., "Welke artsen werken op orthopedie?" vs "Welke dokters zijn er op de afdeling orthopedie?")

Why Reformulated Queries?

The intent classifier already normalizes every user query to well-formed Dutch clinical language. This collapses many distinct inputs into identical or near-identical strings:

User Input	Language	Reformulated Query
"rugpijn"	Dutch	"Welke afdelingen bij ZOL behandelen rugpijn?"
"back pain"	English	"Welke afdelingen bij ZOL behandelen rugpijn?"
"sırt ağrısı"	Turkish	"Welke afdelingen bij ZOL behandelen rugpijn?"

All three produce the same reformulated query → Tier 1 hash hit after the first request, regardless of language.

Schema

Indexes

Index	Type	Purpose
`uq_tenant_query_hash`	UNIQUE (tenant_id, query_hash)	Tier 1 exact-match lookup + upsert conflict target
`idx_cache_embedding`	HNSW (vector_cosine_ops)	Tier 2 approximate nearest neighbor search
`idx_cache_expires`	B-tree (expires_at)	Efficient expired-row cleanup sweeps

Configuration

Setting	Default	Description
`query_cache_enabled`	`true`	Enable/disable the entire cache
`query_cache_ttl_seconds`	`3600` (1 hour)	Time-to-live for cache entries
`query_cache_similarity_threshold`	`0.97`	Minimum cosine similarity for Tier 2 hits (range: 0.80-1.00)

All settings are configurable at runtime via the admin Settings UI and persist per-tenant.

Why PostgreSQL Instead of Redis?

Criterion	PostgreSQL + pgvector	Redis
Vector similarity search	Native HNSW support	Requires Redis Stack (not deployed)
Persistence	Survives restarts	Volatile without AOF/RDB
Multi-tenant queries	SQL WHERE clauses	Manual key namespacing
Already deployed	Yes (vector store)	Yes (sessions only)
ACID guarantees	Full transactions	Eventual consistency

The decisive factor is Tier 2: approximate nearest neighbor search on embeddings requires pgvector's HNSW index. Since PostgreSQL is already the primary data store, adding a table is simpler than deploying Redis Stack.

Cost Optimization

A cache hit avoids all LLM API calls (intent classification, query rewriting, response generation, quality evaluation), which collectively cost approximately $0.003 per query. With 25,000 monthly queries and an estimated 40-60% cache hit rate, caching saves approximately $30-45/month in API costs, plus reducing latency from ~5.5s to < 50ms.

Session & Rate Limiting: Redis

Redis continues to serve three ephemeral caching roles:

Role	TTL	Purpose
Session storage	30 minutes	Maintain conversation context across requests
Rate limit counters	1 minute	Sliding window rate limiting
JWT token blacklist	24 hours	Invalidate tokens on logout

Object Storage: MinIO

MinIO stores the canonical source content -- the Markdown extracted from documents and the raw HTML crawled from the website. This serves two purposes:

Reprocessing: When the chunking strategy or embedding model changes, content can be re-processed from MinIO without re-crawling the website or re-extracting PDFs.
Audit trail: The raw content provides a verifiable reference for what the system ingested, supporting content governance requirements.

MinIO was chosen over cloud object storage (S3, Azure Blob) to keep all data within the on-premise infrastructure, aligning with the hospital's data residency requirements.

Storage-Level Trade-offs​

Storage Overview​

Vector Store: PostgreSQL + pgvector​

Why pgvector?​

Schema Design​

Page Summaries (Contextual Retrieval)​

HNSW Index​

Vector Dimensions​

Entity Taxonomy: PostgreSQL​

Relational Model​

Table Structure​

Relationship Junction Tables​

Tiered Query Strategy​

Operational & Analytics Tables​

Semantic Query Cache: PostgreSQL + pgvector (ADR-0031)​

Two-Tier Lookup​

Why Reformulated Queries?​

Schema​

Indexes​

Configuration​

Why PostgreSQL Instead of Redis?​

Session & Rate Limiting: Redis​

Object Storage: MinIO​