Architecture Decision Records

Architecture Decision Records (ADRs) document the significant technical decisions made during the design and development of the ZOL Intelligent Search system. Following the lightweight ADR format proposed by Nygard (2011), each record captures the context, decision, and consequences of a specific architectural choice.

These records serve as a living architectural memory -- enabling future team members to understand not just what was decided, but why.

Decision Summary

ADR	Title	Status	Key Decision
0001	Configurable Text Chunking	Accepted	Tiktoken + markdown-aware chunking with 350/450/70 parameters
0002	No Mocking Policy	Accepted	Testcontainers for real integration tests
0003	Fuzzy Matching Strategy	Accepted	SequenceMatcher (Ratcliff/Obershelp) for entity resolution in Dutch medical terms
0004	Analytics and Audit Improvements	Accepted	Prometheus metrics + structured audit logging
0005	Nomic-Embed-Text for Dutch RAG	Superseded by ADR-0033	Local Ollama embeddings replacing OpenAI, 768 dimensions
0007	Metadata Filtering and Boosting	Accepted	Intent-to-category filtering with metadata-driven re-ranking boosts
0008	User Feedback and Think Harder	Accepted	Thumbs up/down + escalated search with reranking
0009	Pipeline Progress Indicator	Accepted	WebSocket-based real-time stage reporting
0011	Inline S4U Modules	Accepted	Monorepo replacing 5 external packages
0012	RAG Pipeline Enrichment	Accepted	BM25 hybrid search, context assembly, canonical questions
0013	Reasoning Model Token Budget	Accepted	Tier 2 (standard) requires standard max_tokens (no reasoning overhead)
0014	LLM Entity Validation & Contextual Retrieval	Accepted	Tier 2 validates graph entities; generates page summaries for contextual retrieval
0015	Taxonomy-Driven Entity Normalization	Accepted	Single source of truth (`zol_taxonomy.py`) for all domain knowledge, 4-tier LLM routing, cost tracking
0016	SNOMED CT Terminology Integration	Accepted	SNOMED CT Belgian Edition (280K Dutch concepts) replaces hand-maintained medical aliases
0017	Context Retrieval Architecture	Accepted (amended 2026-05-09)	8-stage multi-signal hybrid RAG pipeline; embedding model and graph backend reversed by ADR-0048 and ADR-0053 (master) — see amendment block at the top of the master ADR
0018	AI URL Category Assessment	Accepted	Two-dimensional AI assessment (category + value) using Tier 3 model for crawled URL classification
0019	Contextual Embeddings	Accepted	Anthropic-style chunk context prepended before embedding and BM25 indexing
0020	Reciprocal Rank Fusion	Accepted	RRF (k=60) replaces weighted linear score combination for hybrid search
0021	Self-RAG	Deferred	Multi-pass generation with self-reflection — deferred due to latency cost
0022	Dynamic Retrieval	Deferred	Mid-generation retrieval (FLARE/DRAGIN) — deferred for short-form answers
0023	Prompt Caching	Deferred	Explicit prompt caching — deferred, OpenAI automatic caching already active
0024	RAG Full Mode	Accepted	Always-on Jina reranking, Tier 3 model, 20 candidates for demo quality
0025	Novation UI Integration	Accepted	Document-style Q&A thread replacing chat bubbles for ZOL website consistency
0026	RAG Pipeline Quality & Speed	Accepted	Condition-aware doctor queries, DEPT_CONDITION_MAP fallback, Tier 2 skip, LRU cache
0027	Multilingual Prompts & LLM Best Practices	Accepted	Cross-lingual RAG: English system prompts, 8-language responses, citation dedup
0028	Golden-Page Taxonomy Gating	Accepted	Three-phase pipeline: scrape → seed → gate. Source page type determines entity creation permissions
0029	Remove Graphiti — Direct Neo4j	Accepted (Neo4j fully removed March 2026, replaced by PostgreSQL taxonomy)	Replace Graphiti library with direct `neo4j.AsyncDriver`, remove OpenAI embedding dependency
0030	LLM Entity Extraction	Accepted	Structured entity extraction from intent LLM replaces dictionary-gated graph routing
0031	Two-Tier Semantic Query Cache	Accepted	PostgreSQL + pgvector cache with SHA-256 hash (Tier 1) and embedding similarity (Tier 2) on reformulated queries
0032	Query Decomposition	Accepted	LLM-powered multi-hop query decomposition into parallel sub-question retrieval
0033	BGE-M3 Embedding Migration	Superseded by ADR-0048	Migrated from nomic-embed-text (768-dim) to BGE-M3 (1024-dim); itself superseded 2026-04-30 by OpenAI `text-embedding-3-large` (1536-dim, hosted)
0034	Pipeline Latency Optimization	Accepted	Jina Reranker v2 API replacing local BGE cross-encoder, 20 candidates default
0035	Think Harder - Dedicated Escalation Configuration	Accepted	Dedicated escalation model and parameters for Think Harder pipeline
0036	Adversarial Input Hardening	Accepted	Perplexity-based anomaly detection, LLM-as-judge, burst rate limiting, streaming retraction
0037	Lingua Language Detection Validation	Accepted	Lingua-based language detection for query language validation
0038	Corrective RAG / CRAG Quality Gate	Accepted	CRAG quality gate for retrieval result validation
0048	OpenAI Embeddings Migration	Accepted	Migrate from BGE-M3 (Ollama, 1024-dim) to OpenAI `text-embedding-3-large` (hosted, 1536-dim) — voice-channel latency + serialization tax forced the move
0049	Thin Voice Architecture	Superseded by ADR-0051	Collapse multi-stage voice pipeline (3–4 sequential LLM calls) to a deterministic regex pre-filter → FAQ → RAG path
ADR-0050 (master)	Twilio + LiveKit SIP Integration	Accepted	Twilio Elastic SIP into self-hosted LiveKit SIP for PSTN voice termination
0051	Agentic-only Voice Orchestrator	Accepted	Make the agentic `VoiceLLMOrchestrator` (GPT-4.1 + tool use) the only voice path; delete the deterministic thin orchestrator
0052	Voice Language Locked at First Utterance	Accepted	Lock the call's language at the first transcribed turn; mid-call switching is unsupported (Deepgram single-mode silence on cross-language speech)
0053	Neo4j Removal + pgvector Consolidation	Accepted (documented retroactively)	Neo4j fully removed in March 2026, replaced by `app.taxonomy_*` tables in PostgreSQL — completes the second half of ADR-0029
0054	Intent Classification Cache	Accepted	`(tenant_id, normalized_query, language)` → `IntentClassificationResult` cache with memory + Redis backends + admin kill switch; ~2 300 ms saved per cache hit
0055	FAQ-Corpus Drift Prevention	Accepted (Phase 1 executed)	Purge 10 hand-curated ZOL FAQs; trust the nightly-refreshed corpus; preserve only safety/policy entries. Phase 2 (nightly drift audit) + Phase 3 (demand-driven promotion) planned.
0056	Chat Answer-Shape Typology	Accepted	Six universal shape patterns (POINT-FACT, STEP-BY-STEP, ATTRIBUTE-LIST, MULTI-ENTITY, COMPARISON, DECISION-TREE) replace ad-hoc per-defect rules; chat-only injection
0057	Tenant-Scoped Prompt Addendums + Doctor-Profile Boost	Accepted	`_TENANT_CHAT_ADDENDUMS[slug]` registry isolates tenant-specific prompt rules; tenant-agnostic 1.50× boost for documents titled `Dr. <Name>`

ADR Conventions

Each ADR follows a consistent structure:

Context: The situation and forces that motivated the decision
Decision: The specific choice made
Consequences: The positive and negative outcomes of the decision
Status: Proposed, Accepted, Deprecated, or Superseded

Numbering Gaps

ADR numbers 0006 and 0010 are intentionally absent. These represent decisions that were proposed but not accepted during the design review process. Their numbers are reserved to maintain chronological consistency.

Decision Timeline

Thematic Groups

Data Processing

ADR-0001 (Chunking): How documents are split for embedding
ADR-0005 (Embeddings): Original embedding model selection (superseded by ADR-0033)
ADR-0033 (BGE-M3 Migration): Migration to BGE-M3 (1024-dim) for superior Dutch retrieval
ADR-0007 (Metadata Boosting): How retrieval results are re-ranked using metadata signals
ADR-0012 (RAG Enrichment): BM25 hybrid search, context assembly, canonical questions
ADR-0014 (LLM Validation): Post-extraction entity validation and contextual retrieval page summaries
ADR-0015 (Taxonomy): Single source of truth for all domain knowledge and normalization
ADR-0016 (SNOMED CT): SNOMED CT Belgian Edition for scalable medical terminology
ADR-0017 (Context Retrieval): Complete 8-stage retrieval pipeline documentation
ADR-0018 (URL Assessment): AI-powered category and value assessment for crawled URLs
ADR-0019 (Contextual Embeddings): Anthropic-style chunk context for enriched embedding and BM25 indexing
ADR-0020 (RRF): Reciprocal Rank Fusion replacing weighted linear score combination
ADR-0026 (Pipeline Quality & Speed): Condition-aware doctor queries, DEPT_CONDITION_MAP fallback, Tier 2 skip
ADR-0028 (Golden Pages): Three-phase pipeline: scrape → seed → gate entity creation
ADR-0030 (LLM Extraction): Structured entity extraction replaces dictionary-gated graph routing
ADR-0032 (Query Decomposition): LLM-powered multi-hop query decomposition into parallel sub-question retrieval

Quality and Testing

ADR-0002 (No Mocking): How the system is tested
ADR-0004 (Analytics): How quality is monitored

User Experience

ADR-0008 (Feedback): How users signal quality issues
ADR-0009 (Progress): How pipeline progress is communicated
ADR-0024 (Full Mode): Always-on quality settings for demo presentation
ADR-0025 (Novation UI): Document-style Q&A thread matching ZOL website design
ADR-0027 (Multilingual): Cross-lingual RAG with 8-language support

Infrastructure

ADR-0003 (Incremental Crawling): Architecture for incremental content crawling
ADR-0006 (Fuzzy Matching): How entity variations are resolved
ADR-0011 (Module Inlining): How the codebase is organized
ADR-0013 (Token Budget): How reasoning model token limits are managed
ADR-0023 (Prompt Caching): Deferred — OpenAI automatic caching sufficient at current scale
ADR-0029 (Remove Graphiti): Direct Neo4j driver replacing Graphiti library (Neo4j fully removed March 2026, replaced by PostgreSQL taxonomy)
ADR-0031 (Semantic Cache): Two-tier PostgreSQL + pgvector cache on reformulated queries
ADR-0034 (Latency Optimization): Jina Reranker v2, reduced candidates, latency improvements

Safety and Security

ADR-0036 (Adversarial Hardening): Perplexity-based anomaly detection, burst rate limiting, streaming retraction

Voice Channel

ADR-0048 (OpenAI Embeddings Migration): voice-channel latency forced the move from on-prem Ollama bge-m3 to OpenAI text-embedding-3-large
ADR-0049 (Thin Voice Architecture, superseded): collapsed an 8-stage voice pipeline to regex pre-filter → FAQ → RAG; subsequently superseded by ADR-0051
ADR-0050 (master record): Twilio Elastic SIP into self-hosted LiveKit SIP for PSTN voice termination
ADR-0051 (Agentic-only Voice Orchestrator): GPT-4.1 with tool use becomes the only voice path; deterministic thin pipeline deleted
ADR-0052 (Voice Language Locking): voice channel locks language at first utterance; mid-call switching unsupported (Deepgram single-mode silence)

Knowledge Representation

ADR-0053 (master record): Neo4j fully removed; replaced by app.taxonomy_* tables in PostgreSQL — completes the second half of ADR-0029

Deferred Investigations

ADR-0021 (Self-RAG): Multi-pass generation with self-reflection — deferred due to latency
ADR-0022 (Dynamic Retrieval): Mid-generation retrieval (FLARE/DRAGIN) — deferred for short-form answers

Decision Summary​

ADR Conventions​

Decision Timeline​

Thematic Groups​

Data Processing​

Quality and Testing​

User Experience​

Infrastructure​

Safety and Security​

Voice Channel​

Knowledge Representation​

Deferred Investigations​