Architecture Decision Records
Architecture Decision Records (ADRs) document the significant technical decisions made during the design and development of the ZOL Intelligent Search system. Following the lightweight ADR format proposed by Nygard (2011), each record captures the context, decision, and consequences of a specific architectural choice.
These records serve as a living architectural memory -- enabling future team members to understand not just what was decided, but why.
Decision Summary
| ADR | Title | Status | Key Decision |
|---|---|---|---|
| 0001 | Configurable Text Chunking | Accepted | Tiktoken + markdown-aware chunking with 350/450/70 parameters |
| 0002 | No Mocking Policy | Accepted | Testcontainers for real integration tests |
| 0003 | Fuzzy Matching Strategy | Accepted | SequenceMatcher (Ratcliff/Obershelp) for entity resolution in Dutch medical terms |
| 0004 | Analytics and Audit Improvements | Accepted | Prometheus metrics + structured audit logging |
| 0005 | Nomic-Embed-Text for Dutch RAG | Superseded by ADR-0033 | Local Ollama embeddings replacing OpenAI, 768 dimensions |
| 0007 | Metadata Filtering and Boosting | Accepted | Intent-to-category filtering with metadata-driven re-ranking boosts |
| 0008 | User Feedback and Think Harder | Accepted | Thumbs up/down + escalated search with reranking |
| 0009 | Pipeline Progress Indicator | Accepted | WebSocket-based real-time stage reporting |
| 0011 | Inline S4U Modules | Accepted | Monorepo replacing 5 external packages |
| 0012 | RAG Pipeline Enrichment | Accepted | BM25 hybrid search, context assembly, canonical questions |
| 0013 | Reasoning Model Token Budget | Accepted | Tier 2 (standard) requires standard max_tokens (no reasoning overhead) |
| 0014 | LLM Entity Validation & Contextual Retrieval | Accepted | Tier 2 validates graph entities; generates page summaries for contextual retrieval |
| 0015 | Taxonomy-Driven Entity Normalization | Accepted | Single source of truth (zol_taxonomy.py) for all domain knowledge, 4-tier LLM routing, cost tracking |
| 0016 | SNOMED CT Terminology Integration | Accepted | SNOMED CT Belgian Edition (280K Dutch concepts) replaces hand-maintained medical aliases |
| 0017 | Context Retrieval Architecture | Accepted (amended 2026-05-09) | 8-stage multi-signal hybrid RAG pipeline; embedding model and graph backend reversed by ADR-0048 and ADR-0053 (master) — see amendment block at the top of the master ADR |
| 0018 | AI URL Category Assessment | Accepted | Two-dimensional AI assessment (category + value) using Tier 3 model for crawled URL classification |
| 0019 | Contextual Embeddings | Accepted | Anthropic-style chunk context prepended before embedding and BM25 indexing |
| 0020 | Reciprocal Rank Fusion | Accepted | RRF (k=60) replaces weighted linear score combination for hybrid search |
| 0021 | Self-RAG | Deferred | Multi-pass generation with self-reflection — deferred due to latency cost |
| 0022 | Dynamic Retrieval | Deferred | Mid-generation retrieval (FLARE/DRAGIN) — deferred for short-form answers |
| 0023 | Prompt Caching | Deferred | Explicit prompt caching — deferred, OpenAI automatic caching already active |
| 0024 | RAG Full Mode | Accepted | Always-on Jina reranking, Tier 3 model, 20 candidates for demo quality |
| 0025 | Novation UI Integration | Accepted | Document-style Q&A thread replacing chat bubbles for ZOL website consistency |
| 0026 | RAG Pipeline Quality & Speed | Accepted | Condition-aware doctor queries, DEPT_CONDITION_MAP fallback, Tier 2 skip, LRU cache |
| 0027 | Multilingual Prompts & LLM Best Practices | Accepted | Cross-lingual RAG: English system prompts, 8-language responses, citation dedup |
| 0028 | Golden-Page Taxonomy Gating | Accepted | Three-phase pipeline: scrape → seed → gate. Source page type determines entity creation permissions |
| 0029 | Remove Graphiti — Direct Neo4j | Accepted (Neo4j fully removed March 2026, replaced by PostgreSQL taxonomy) | Replace Graphiti library with direct neo4j.AsyncDriver, remove OpenAI embedding dependency |
| 0030 | LLM Entity Extraction | Accepted | Structured entity extraction from intent LLM replaces dictionary-gated graph routing |
| 0031 | Two-Tier Semantic Query Cache | Accepted | PostgreSQL + pgvector cache with SHA-256 hash (Tier 1) and embedding similarity (Tier 2) on reformulated queries |
| 0032 | Query Decomposition | Accepted | LLM-powered multi-hop query decomposition into parallel sub-question retrieval |
| 0033 | BGE-M3 Embedding Migration | Superseded by ADR-0048 | Migrated from nomic-embed-text (768-dim) to BGE-M3 (1024-dim); itself superseded 2026-04-30 by OpenAI text-embedding-3-large (1536-dim, hosted) |
| 0034 | Pipeline Latency Optimization | Accepted | Jina Reranker v2 API replacing local BGE cross-encoder, 20 candidates default |
| 0035 | Think Harder - Dedicated Escalation Configuration | Accepted | Dedicated escalation model and parameters for Think Harder pipeline |
| 0036 | Adversarial Input Hardening | Accepted | Perplexity-based anomaly detection, LLM-as-judge, burst rate limiting, streaming retraction |
| 0037 | Lingua Language Detection Validation | Accepted | Lingua-based language detection for query language validation |
| 0038 | Corrective RAG / CRAG Quality Gate | Accepted | CRAG quality gate for retrieval result validation |
| 0048 | OpenAI Embeddings Migration | Accepted | Migrate from BGE-M3 (Ollama, 1024-dim) to OpenAI text-embedding-3-large (hosted, 1536-dim) — voice-channel latency + serialization tax forced the move |
| 0049 | Thin Voice Architecture | Superseded by ADR-0051 | Collapse multi-stage voice pipeline (3–4 sequential LLM calls) to a deterministic regex pre-filter → FAQ → RAG path |
| ADR-0050 (master) | Twilio + LiveKit SIP Integration | Accepted | Twilio Elastic SIP into self-hosted LiveKit SIP for PSTN voice termination |
| 0051 | Agentic-only Voice Orchestrator | Accepted | Make the agentic VoiceLLMOrchestrator (GPT-4.1 + tool use) the only voice path; delete the deterministic thin orchestrator |
| 0052 | Voice Language Locked at First Utterance | Accepted | Lock the call's language at the first transcribed turn; mid-call switching is unsupported (Deepgram single-mode silence on cross-language speech) |
| 0053 | Neo4j Removal + pgvector Consolidation | Accepted (documented retroactively) | Neo4j fully removed in March 2026, replaced by app.taxonomy_* tables in PostgreSQL — completes the second half of ADR-0029 |
| 0054 | Intent Classification Cache | Accepted | (tenant_id, normalized_query, language) → IntentClassificationResult cache with memory + Redis backends + admin kill switch; ~2 300 ms saved per cache hit |
| 0055 | FAQ-Corpus Drift Prevention | Accepted (Phase 1 executed) | Purge 10 hand-curated ZOL FAQs; trust the nightly-refreshed corpus; preserve only safety/policy entries. Phase 2 (nightly drift audit) + Phase 3 (demand-driven promotion) planned. |
| 0056 | Chat Answer-Shape Typology | Accepted | Six universal shape patterns (POINT-FACT, STEP-BY-STEP, ATTRIBUTE-LIST, MULTI-ENTITY, COMPARISON, DECISION-TREE) replace ad-hoc per-defect rules; chat-only injection |
| 0057 | Tenant-Scoped Prompt Addendums + Doctor-Profile Boost | Accepted | _TENANT_CHAT_ADDENDUMS[slug] registry isolates tenant-specific prompt rules; tenant-agnostic 1.50× boost for documents titled Dr. <Name> |
ADR Conventions
Each ADR follows a consistent structure:
- Context: The situation and forces that motivated the decision
- Decision: The specific choice made
- Consequences: The positive and negative outcomes of the decision
- Status: Proposed, Accepted, Deprecated, or Superseded
Numbering Gaps
ADR numbers 0006 and 0010 are intentionally absent. These represent decisions that were proposed but not accepted during the design review process. Their numbers are reserved to maintain chronological consistency.
Decision Timeline
Thematic Groups
Data Processing
- ADR-0001 (Chunking): How documents are split for embedding
- ADR-0005 (Embeddings): Original embedding model selection (superseded by ADR-0033)
- ADR-0033 (BGE-M3 Migration): Migration to BGE-M3 (1024-dim) for superior Dutch retrieval
- ADR-0007 (Metadata Boosting): How retrieval results are re-ranked using metadata signals
- ADR-0012 (RAG Enrichment): BM25 hybrid search, context assembly, canonical questions
- ADR-0014 (LLM Validation): Post-extraction entity validation and contextual retrieval page summaries
- ADR-0015 (Taxonomy): Single source of truth for all domain knowledge and normalization
- ADR-0016 (SNOMED CT): SNOMED CT Belgian Edition for scalable medical terminology
- ADR-0017 (Context Retrieval): Complete 8-stage retrieval pipeline documentation
- ADR-0018 (URL Assessment): AI-powered category and value assessment for crawled URLs
- ADR-0019 (Contextual Embeddings): Anthropic-style chunk context for enriched embedding and BM25 indexing
- ADR-0020 (RRF): Reciprocal Rank Fusion replacing weighted linear score combination
- ADR-0026 (Pipeline Quality & Speed): Condition-aware doctor queries, DEPT_CONDITION_MAP fallback, Tier 2 skip
- ADR-0028 (Golden Pages): Three-phase pipeline: scrape → seed → gate entity creation
- ADR-0030 (LLM Extraction): Structured entity extraction replaces dictionary-gated graph routing
- ADR-0032 (Query Decomposition): LLM-powered multi-hop query decomposition into parallel sub-question retrieval
Quality and Testing
- ADR-0002 (No Mocking): How the system is tested
- ADR-0004 (Analytics): How quality is monitored
User Experience
- ADR-0008 (Feedback): How users signal quality issues
- ADR-0009 (Progress): How pipeline progress is communicated
- ADR-0024 (Full Mode): Always-on quality settings for demo presentation
- ADR-0025 (Novation UI): Document-style Q&A thread matching ZOL website design
- ADR-0027 (Multilingual): Cross-lingual RAG with 8-language support
Infrastructure
- ADR-0003 (Incremental Crawling): Architecture for incremental content crawling
- ADR-0006 (Fuzzy Matching): How entity variations are resolved
- ADR-0011 (Module Inlining): How the codebase is organized
- ADR-0013 (Token Budget): How reasoning model token limits are managed
- ADR-0023 (Prompt Caching): Deferred — OpenAI automatic caching sufficient at current scale
- ADR-0029 (Remove Graphiti): Direct Neo4j driver replacing Graphiti library (Neo4j fully removed March 2026, replaced by PostgreSQL taxonomy)
- ADR-0031 (Semantic Cache): Two-tier PostgreSQL + pgvector cache on reformulated queries
- ADR-0034 (Latency Optimization): Jina Reranker v2, reduced candidates, latency improvements
Safety and Security
- ADR-0036 (Adversarial Hardening): Perplexity-based anomaly detection, burst rate limiting, streaming retraction
Voice Channel
- ADR-0048 (OpenAI Embeddings Migration): voice-channel latency forced the move from on-prem Ollama bge-m3 to OpenAI
text-embedding-3-large - ADR-0049 (Thin Voice Architecture, superseded): collapsed an 8-stage voice pipeline to regex pre-filter → FAQ → RAG; subsequently superseded by ADR-0051
- ADR-0050 (master record): Twilio Elastic SIP into self-hosted LiveKit SIP for PSTN voice termination
- ADR-0051 (Agentic-only Voice Orchestrator): GPT-4.1 with tool use becomes the only voice path; deterministic thin pipeline deleted
- ADR-0052 (Voice Language Locking): voice channel locks language at first utterance; mid-call switching unsupported (Deepgram single-mode silence)
Knowledge Representation
- ADR-0053 (master record): Neo4j fully removed; replaced by
app.taxonomy_*tables in PostgreSQL — completes the second half of ADR-0029
Deferred Investigations
- ADR-0021 (Self-RAG): Multi-pass generation with self-reflection — deferred due to latency
- ADR-0022 (Dynamic Retrieval): Mid-generation retrieval (FLARE/DRAGIN) — deferred for short-form answers