Storage Architecture
The ZOL Intelligent Search system employs polyglot persistence across PostgreSQL (with the pgvector extension), Redis, and MinIO — each optimised for a distinct access pattern. No single database technology excels at all of the workloads the system must support: vector similarity search at low latency, ACID-protected relational entity joins, ephemeral session and rate-limit state, and large blob storage for raw document content.
Storage-Level Trade-offs
| Decision | Chosen | Alternatives considered | Rejected because |
|---|---|---|---|
| Vector store | PostgreSQL + pgvector (@pgvector_docs) — single ACID database for embeddings, taxonomy, and tenant metadata | Pinecone / Weaviate / Qdrant (managed); FAISS in-process (@johnson2017faiss) | Managed vector DBs add a second operational system, separate access-control plane, and synchronisation surface against tenant + taxonomy data; FAISS is non-persistent and lacks ACID guarantees. pgvector keeps everything in one ACID database with one backup story. |
| ANN index type | HNSW (Hierarchical Navigable Small Worlds) | IVFFlat (pgvector's other index type); exact nearest neighbour | IVFFlat needs periodic reindexing as the corpus grows, which is operationally costly with continuous nightly ingest. Exact NN scales linearly per query and would saturate at corpus growth. HNSW handles insertions incrementally without reindex; the trade-off is higher memory at index build time, accepted at our 10 K-chunk scale. The ANN-vs-exact trade-off is the same one Faiss @johnson2017faiss characterises at billion-scale; pgvector's HNSW operator inherits that trade-off curve at PostgreSQL scale. |
| Knowledge graph | PostgreSQL taxonomy tables | Neo4j Graph Data Science (@neo4j_gds_manual) | Removed in 2026-05-02 (ADR-0053). The GDS algorithms (PageRank, community detection) were never load-bearing; the queries we actually used were SQL joins over typed entity tables. Removing Neo4j eliminated an operational service without losing any retrieval capability. |
| Cache backend | PostgreSQL (app.semantic_query_cache) | Redis with Redis Stack vector search; in-process LRU | Tier 2 of the cache requires HNSW similarity search on 1,536-dim embeddings, which Redis OSS does not provide. Redis Stack (with the search module) is not in our current Docker Compose stack. In-process LRU loses entries on every redeploy. PostgreSQL already has pgvector and an HNSW operator; an extra table is the simplest delta. |
| Object store | MinIO (on-premise S3-compatible) | AWS S3 / Azure Blob / Cloudflare R2 | Hospital data residency requirements forbid sending document content to any cloud blob store. MinIO runs in our Docker Compose stack and presents an S3-compatible API, so the application code is portable to a future cloud move. |
Storage Overview
Vector Store: PostgreSQL + pgvector
Why pgvector?
The vector store is the backbone of semantic search, storing the dense sentence embeddings that enable similarity-based retrieval (Reimers & Gurevych, 2019). By embedding pgvector into PostgreSQL rather than using a standalone vector database (Pinecone, Weaviate, Qdrant), the system gains several advantages:
- Single database for both relational metadata and vector embeddings
- ACID transactions protecting data integrity during ingestion
- Mature ecosystem -- backup, monitoring, and tooling from decades of PostgreSQL investment
- No vendor lock-in -- pgvector is an open-source extension
Schema Design
The Python model maps the chunk_metadata attribute to the database column named metadata. They refer to the same JSONB column — chunk_metadata is the Python name, metadata is the SQL column name.
Page Summaries (Contextual Retrieval)
The chunk_metadata JSONB column stores page summaries generated during ingestion by the LLM Entity Validator (ADR-0014). These are 2-3 sentence Dutch descriptions of the source document, used for contextual retrieval — a technique that Anthropic's research shows reduces retrieval failure rates by 49% when combined with hybrid search.
At query time, the context assembly pipeline prepends the page summary to the first chunk from each document, giving the LLM critical document-level context that individual chunks lose during chunking. No database migration was required — the JSONB column accepts arbitrary keys.
HNSW Index
The system uses a Hierarchical Navigable Small World (HNSW) index for approximate nearest neighbor search. HNSW was selected over IVFFlat (the other pgvector index type) for a specific architectural reason:
| Criterion | HNSW | IVFFlat |
|---|---|---|
| Insert performance | No reindex needed | Requires periodic reindexing |
| Query speed | Slightly faster | Fast after reindex |
| Memory usage | Higher | Lower |
| Dynamic data | Excellent | Poor without reindex |
For ZOL, where content is continuously ingested (new brochures published, website updated), HNSW's ability to handle insertions without reindexing is a decisive advantage. IVFFlat would require scheduled reindexing jobs to maintain search quality, adding operational complexity.
The HNSW index is configured with m=16 (connections per layer) and ef_construction=200 (construction quality). These values balance search accuracy against index build time. The current ZOL production corpus is ~10,400 chunks (verify via \d+ app.document_chunks on the pilot DB; the academic basis for HNSW's ANN trade-off is summarised in @johnson2017faiss — Faiss billion-scale similarity search).
Vector Dimensions
Each embedding is a 1,536-dimensional vector produced by OpenAI text-embedding-3-large (truncated from the model's native 3,072 dimensions to fit pgvector's HNSW 2,000-dim limit). See ADR-0048, @openai2024embeddings for the model announcement, and @pgvector_docs for the indexing layer.
Migration history:
- nomic-embed-text (768-dim, Ollama) — initial deployment
- BGE-M3 (1,024-dim, Ollama) — Feb 2026 (ADR-0033)
text-embedding-3-large(1,536-dim truncated, OpenAI) — current; Apr 2026 (ADR-0048)
The current model provides:
- Stronger Dutch retrieval quality (MTEB-NL ~64.6 vs BGE-M3's 60.0)
- Removal of the Ollama operational dependency for embeddings
- Predictable latency under load via the OpenAI API
- Cost of ~$0.13 per 1M tokens (75% prompt-cache discount; ~$0.20/month at 25,000 monthly queries) — accepted in ADR-0048 as the trade-off for the quality and operational gains
Entity Taxonomy: PostgreSQL
Relational Model
Entity relationships are stored in PostgreSQL taxonomy tables, replacing the earlier Neo4j knowledge graph. This consolidation keeps all structured data in a single database, simplifying operations and enabling transactional consistency with the vector store.
Table Structure
The taxonomy is modelled across multiple PostgreSQL tables, each scoped by tenant_id:
| Table | Description | Example Row |
|---|---|---|
hospitals | Top-level organizations | ZOL |
campuses | Physical locations | Sint-Jan, André Dumont |
departments | Hospital departments with aliases | Cardiologie (aliases: hartafdeling, hart- en vaatziekten) |
conditions | Medical conditions with SNOMED codes | Hartfalen |
treatments | Medical procedures | Knieprothese, Chemotherapie |
examinations | Diagnostic procedures | MRI, Bloedonderzoek |
doctors | Medical professionals | Dr. Van den Berg |
Relationship Junction Tables
Relationships are stored in junction tables with foreign keys:
| Junction Table | From | To | Extra Columns |
|---|---|---|---|
department_campuses | Department | Campus | - |
doctor_departments | Doctor | Department | role, schedule, status, contacts |
department_conditions | Department | Condition | - |
department_treatments | Department | Treatment | - |
department_examinations | Department | Examination | - |
treatment_conditions | Treatment | Condition | - |
examination_conditions | Examination | Condition | - |
All junction tables carry tenant_id for multi-tenant isolation and use composite unique constraints to prevent duplicate relationships.
Tiered Query Strategy
The taxonomy employs a two-tier query strategy to maximize both precision and recall:
- Tier 1 -- Typed Entities: SQL queries against typed entity tables, routed by LLM-extracted entities. Used for precise entity lookups ("Who is Dr. Van den Berg?") and semantic queries resolved through taxonomy aliases ("hartproblemen" -> Hartfalen -> Cardiologie). Highest confidence, fastest execution.
- Tier 2 -- Vector Fallback: Falls back to pgvector semantic search when the taxonomy has no relevant entities. Ensures the system always returns results.
Neo4j was fully removed in 2026-05-02 (ADR-0053). All entity relationships now live in PostgreSQL taxonomy tables (taxonomy_entities, taxonomy_relationships). This eliminates a separate infrastructure dependency while preserving the same entity model and query semantics. The HospitalTaxonomy class provides the same API surface, with SQL queries replacing Cypher.
Operational & Analytics Tables
The taxonomy and document tables above are the system's primary semantic surface. A second group of tables — added between 2026-04 and 2026-05-09 — captures the operational state of the running system and feeds the Operations dashboard. All inherit the standard tenant_id FK isolation pattern (see Multi-Tenancy).
| Table | Migration | Purpose | Key columns | Source |
|---|---|---|---|---|
app.ingest_runs | 061 | Audit log for nightly auto-ingest. One row per scheduler invocation. | mode, started_at, completed_at, urls_discovered, urls_indexed, urls_failed, failure_class, notes | backend/app/models/database.py:875 |
app.tenant_business_hours | 065 | Per-tenant operating-hours configuration for voice-channel "is the hospital open right now?" answers. | tenant_id, weekday, open_time, close_time, closed | backend/app/models/database.py:1651 |
app.tenant_cost_models | 065 | Per-tenant LLM cost model (per-token rates by model tier) used by CostTracker. | tenant_id, model_tier, prompt_token_rate, completion_token_rate, effective_from | backend/app/models/database.py:1678 |
app.daily_tenant_metrics | 065 | Daily aggregation feeding the Value Dashboard (queries served, cache hit rate, deflection ratio, $ saved). | tenant_id, date, query_count, cache_hit_rate, deflection_count, dollars_saved | backend/app/models/database.py:1707 |
app.category_mismatch_telemetry | 066 | Per-turn telemetry for the Value Framework affinity rerank — what fraction of top-K chunks were OFF-category vs the intent's preferred set. Drives the CategoryMismatchTrend chart in the Operations / costs tab. | tenant_id, intent_class, primary_category, mismatch_rate, chunks_total, chunks_off_category, query_preview | migration 066 |
app.diagnostic_feedback | 067 | Operator-rated verdicts (agree / partial / disagree) on v2 diagnostic-investigation outputs. Drives the DiagnosticAccuracyTrend chart. UNIQUE (investigation_id, operator_user_id) so each operator has one rating per investigation. | investigation_id, tenant_id, operator_user_id, verdict, notes | migration 067 |
These tables are read by the /api/v1/admin/feedback/telemetry-stats, /api/v1/admin/ops/category-mismatch, and /api/v1/admin/ops/diagnostic-accuracy endpoints; see Feedback Dashboard Metrics for the rendering layer.
Semantic Query Cache: PostgreSQL + pgvector (ADR-0031)
The query cache is a two-tier system that stores previous query results in PostgreSQL, using the same pgvector infrastructure as the vector store. It operates on LLM-reformulated queries (not raw user input), which maximizes hit rates because the intent classifier normalizes language, spelling, and word order before caching.
Two-Tier Lookup
| Tier | Mechanism | Latency | What It Catches |
|---|---|---|---|
| Tier 1 | SHA-256 hash of reformulated query text | ~1ms | Identical reformulations — including cross-language queries that normalize to the same Dutch form |
| Tier 2 | Cosine similarity via HNSW index on 1,536-dim text-embedding-3-large embeddings | ~30ms | Semantically equivalent queries with different wording (e.g., "Welke artsen werken op orthopedie?" vs "Welke dokters zijn er op de afdeling orthopedie?") |
Why Reformulated Queries?
The intent classifier already normalizes every user query to well-formed Dutch clinical language. This collapses many distinct inputs into identical or near-identical strings:
| User Input | Language | Reformulated Query |
|---|---|---|
| "rugpijn" | Dutch | "Welke afdelingen bij ZOL behandelen rugpijn?" |
| "back pain" | English | "Welke afdelingen bij ZOL behandelen rugpijn?" |
| "sırt ağrısı" | Turkish | "Welke afdelingen bij ZOL behandelen rugpijn?" |
All three produce the same reformulated query → Tier 1 hash hit after the first request, regardless of language.
Schema
Indexes
| Index | Type | Purpose |
|---|---|---|
uq_tenant_query_hash | UNIQUE (tenant_id, query_hash) | Tier 1 exact-match lookup + upsert conflict target |
idx_cache_embedding | HNSW (vector_cosine_ops) | Tier 2 approximate nearest neighbor search |
idx_cache_expires | B-tree (expires_at) | Efficient expired-row cleanup sweeps |
Configuration
| Setting | Default | Description |
|---|---|---|
query_cache_enabled | true | Enable/disable the entire cache |
query_cache_ttl_seconds | 3600 (1 hour) | Time-to-live for cache entries |
query_cache_similarity_threshold | 0.97 | Minimum cosine similarity for Tier 2 hits (range: 0.80-1.00) |
All settings are configurable at runtime via the admin Settings UI and persist per-tenant.
Why PostgreSQL Instead of Redis?
| Criterion | PostgreSQL + pgvector | Redis |
|---|---|---|
| Vector similarity search | Native HNSW support | Requires Redis Stack (not deployed) |
| Persistence | Survives restarts | Volatile without AOF/RDB |
| Multi-tenant queries | SQL WHERE clauses | Manual key namespacing |
| Already deployed | Yes (vector store) | Yes (sessions only) |
| ACID guarantees | Full transactions | Eventual consistency |
The decisive factor is Tier 2: approximate nearest neighbor search on embeddings requires pgvector's HNSW index. Since PostgreSQL is already the primary data store, adding a table is simpler than deploying Redis Stack.
A cache hit avoids all LLM API calls (intent classification, query rewriting, response generation, quality evaluation), which collectively cost approximately $0.003 per query. With 25,000 monthly queries and an estimated 40-60% cache hit rate, caching saves approximately $30-45/month in API costs, plus reducing latency from ~5.5s to < 50ms.
Session & Rate Limiting: Redis
Redis continues to serve three ephemeral caching roles:
| Role | TTL | Purpose |
|---|---|---|
| Session storage | 30 minutes | Maintain conversation context across requests |
| Rate limit counters | 1 minute | Sliding window rate limiting |
| JWT token blacklist | 24 hours | Invalidate tokens on logout |
Object Storage: MinIO
MinIO stores the canonical source content -- the Markdown extracted from documents and the raw HTML crawled from the website. This serves two purposes:
- Reprocessing: When the chunking strategy or embedding model changes, content can be re-processed from MinIO without re-crawling the website or re-extracting PDFs.
- Audit trail: The raw content provides a verifiable reference for what the system ingested, supporting content governance requirements.
MinIO was chosen over cloud object storage (S3, Azure Blob) to keep all data within the on-premise infrastructure, aligning with the hospital's data residency requirements.