Skip to main content

Architecture Decision Records

Architecture Decision Records (ADRs) document the significant technical decisions made during the design and development of the ZOL Intelligent Search system. Following the lightweight ADR format proposed by Nygard (2011), each record captures the context, decision, and consequences of a specific architectural choice.

These records serve as a living architectural memory -- enabling future team members to understand not just what was decided, but why.

Decision Summary

ADRTitleStatusKey Decision
0001Configurable Text ChunkingAcceptedTiktoken + markdown-aware chunking with 350/450/70 parameters
0002No Mocking PolicyAcceptedTestcontainers for real integration tests
0003Fuzzy Matching StrategyAcceptedSequenceMatcher (Ratcliff/Obershelp) for entity resolution in Dutch medical terms
0004Analytics and Audit ImprovementsAcceptedPrometheus metrics + structured audit logging
0005Nomic-Embed-Text for Dutch RAGSuperseded by ADR-0033Local Ollama embeddings replacing OpenAI, 768 dimensions
0007Metadata Filtering and BoostingAcceptedIntent-to-category filtering with metadata-driven re-ranking boosts
0008User Feedback and Think HarderAcceptedThumbs up/down + escalated search with reranking
0009Pipeline Progress IndicatorAcceptedWebSocket-based real-time stage reporting
0011Inline S4U ModulesAcceptedMonorepo replacing 5 external packages
0012RAG Pipeline EnrichmentAcceptedBM25 hybrid search, context assembly, canonical questions
0013Reasoning Model Token BudgetAcceptedTier 2 (standard) requires standard max_tokens (no reasoning overhead)
0014LLM Entity Validation & Contextual RetrievalAcceptedTier 2 validates graph entities; generates page summaries for contextual retrieval
0015Taxonomy-Driven Entity NormalizationAcceptedSingle source of truth (zol_taxonomy.py) for all domain knowledge, 4-tier LLM routing, cost tracking
0016SNOMED CT Terminology IntegrationAcceptedSNOMED CT Belgian Edition (280K Dutch concepts) replaces hand-maintained medical aliases
0017Context Retrieval ArchitectureAccepted (amended 2026-05-09)8-stage multi-signal hybrid RAG pipeline; embedding model and graph backend reversed by ADR-0048 and ADR-0053 (master) — see amendment block at the top of the master ADR
0018AI URL Category AssessmentAcceptedTwo-dimensional AI assessment (category + value) using Tier 3 model for crawled URL classification
0019Contextual EmbeddingsAcceptedAnthropic-style chunk context prepended before embedding and BM25 indexing
0020Reciprocal Rank FusionAcceptedRRF (k=60) replaces weighted linear score combination for hybrid search
0021Self-RAGDeferredMulti-pass generation with self-reflection — deferred due to latency cost
0022Dynamic RetrievalDeferredMid-generation retrieval (FLARE/DRAGIN) — deferred for short-form answers
0023Prompt CachingDeferredExplicit prompt caching — deferred, OpenAI automatic caching already active
0024RAG Full ModeAcceptedAlways-on Jina reranking, Tier 3 model, 20 candidates for demo quality
0025Novation UI IntegrationAcceptedDocument-style Q&A thread replacing chat bubbles for ZOL website consistency
0026RAG Pipeline Quality & SpeedAcceptedCondition-aware doctor queries, DEPT_CONDITION_MAP fallback, Tier 2 skip, LRU cache
0027Multilingual Prompts & LLM Best PracticesAcceptedCross-lingual RAG: English system prompts, 8-language responses, citation dedup
0028Golden-Page Taxonomy GatingAcceptedThree-phase pipeline: scrape → seed → gate. Source page type determines entity creation permissions
0029Remove Graphiti — Direct Neo4jAccepted (Neo4j fully removed March 2026, replaced by PostgreSQL taxonomy)Replace Graphiti library with direct neo4j.AsyncDriver, remove OpenAI embedding dependency
0030LLM Entity ExtractionAcceptedStructured entity extraction from intent LLM replaces dictionary-gated graph routing
0031Two-Tier Semantic Query CacheAcceptedPostgreSQL + pgvector cache with SHA-256 hash (Tier 1) and embedding similarity (Tier 2) on reformulated queries
0032Query DecompositionAcceptedLLM-powered multi-hop query decomposition into parallel sub-question retrieval
0033BGE-M3 Embedding MigrationSuperseded by ADR-0048Migrated from nomic-embed-text (768-dim) to BGE-M3 (1024-dim); itself superseded 2026-04-30 by OpenAI text-embedding-3-large (1536-dim, hosted)
0034Pipeline Latency OptimizationAcceptedJina Reranker v2 API replacing local BGE cross-encoder, 20 candidates default
0035Think Harder - Dedicated Escalation ConfigurationAcceptedDedicated escalation model and parameters for Think Harder pipeline
0036Adversarial Input HardeningAcceptedPerplexity-based anomaly detection, LLM-as-judge, burst rate limiting, streaming retraction
0037Lingua Language Detection ValidationAcceptedLingua-based language detection for query language validation
0038Corrective RAG / CRAG Quality GateAcceptedCRAG quality gate for retrieval result validation
0048OpenAI Embeddings MigrationAcceptedMigrate from BGE-M3 (Ollama, 1024-dim) to OpenAI text-embedding-3-large (hosted, 1536-dim) — voice-channel latency + serialization tax forced the move
0049Thin Voice ArchitectureSuperseded by ADR-0051Collapse multi-stage voice pipeline (3–4 sequential LLM calls) to a deterministic regex pre-filter → FAQ → RAG path
ADR-0050 (master)Twilio + LiveKit SIP IntegrationAcceptedTwilio Elastic SIP into self-hosted LiveKit SIP for PSTN voice termination
0051Agentic-only Voice OrchestratorAcceptedMake the agentic VoiceLLMOrchestrator (GPT-4.1 + tool use) the only voice path; delete the deterministic thin orchestrator
0052Voice Language Locked at First UtteranceAcceptedLock the call's language at the first transcribed turn; mid-call switching is unsupported (Deepgram single-mode silence on cross-language speech)
0053Neo4j Removal + pgvector ConsolidationAccepted (documented retroactively)Neo4j fully removed in March 2026, replaced by app.taxonomy_* tables in PostgreSQL — completes the second half of ADR-0029
0054Intent Classification CacheAccepted(tenant_id, normalized_query, language)IntentClassificationResult cache with memory + Redis backends + admin kill switch; ~2 300 ms saved per cache hit
0055FAQ-Corpus Drift PreventionAccepted (Phase 1 executed)Purge 10 hand-curated ZOL FAQs; trust the nightly-refreshed corpus; preserve only safety/policy entries. Phase 2 (nightly drift audit) + Phase 3 (demand-driven promotion) planned.
0056Chat Answer-Shape TypologyAcceptedSix universal shape patterns (POINT-FACT, STEP-BY-STEP, ATTRIBUTE-LIST, MULTI-ENTITY, COMPARISON, DECISION-TREE) replace ad-hoc per-defect rules; chat-only injection
0057Tenant-Scoped Prompt Addendums + Doctor-Profile BoostAccepted_TENANT_CHAT_ADDENDUMS[slug] registry isolates tenant-specific prompt rules; tenant-agnostic 1.50× boost for documents titled Dr. <Name>

ADR Conventions

Each ADR follows a consistent structure:

  • Context: The situation and forces that motivated the decision
  • Decision: The specific choice made
  • Consequences: The positive and negative outcomes of the decision
  • Status: Proposed, Accepted, Deprecated, or Superseded
Numbering Gaps

ADR numbers 0006 and 0010 are intentionally absent. These represent decisions that were proposed but not accepted during the design review process. Their numbers are reserved to maintain chronological consistency.

Decision Timeline

Thematic Groups

Data Processing

  • ADR-0001 (Chunking): How documents are split for embedding
  • ADR-0005 (Embeddings): Original embedding model selection (superseded by ADR-0033)
  • ADR-0033 (BGE-M3 Migration): Migration to BGE-M3 (1024-dim) for superior Dutch retrieval
  • ADR-0007 (Metadata Boosting): How retrieval results are re-ranked using metadata signals
  • ADR-0012 (RAG Enrichment): BM25 hybrid search, context assembly, canonical questions
  • ADR-0014 (LLM Validation): Post-extraction entity validation and contextual retrieval page summaries
  • ADR-0015 (Taxonomy): Single source of truth for all domain knowledge and normalization
  • ADR-0016 (SNOMED CT): SNOMED CT Belgian Edition for scalable medical terminology
  • ADR-0017 (Context Retrieval): Complete 8-stage retrieval pipeline documentation
  • ADR-0018 (URL Assessment): AI-powered category and value assessment for crawled URLs
  • ADR-0019 (Contextual Embeddings): Anthropic-style chunk context for enriched embedding and BM25 indexing
  • ADR-0020 (RRF): Reciprocal Rank Fusion replacing weighted linear score combination
  • ADR-0026 (Pipeline Quality & Speed): Condition-aware doctor queries, DEPT_CONDITION_MAP fallback, Tier 2 skip
  • ADR-0028 (Golden Pages): Three-phase pipeline: scrape → seed → gate entity creation
  • ADR-0030 (LLM Extraction): Structured entity extraction replaces dictionary-gated graph routing
  • ADR-0032 (Query Decomposition): LLM-powered multi-hop query decomposition into parallel sub-question retrieval

Quality and Testing

  • ADR-0002 (No Mocking): How the system is tested
  • ADR-0004 (Analytics): How quality is monitored

User Experience

  • ADR-0008 (Feedback): How users signal quality issues
  • ADR-0009 (Progress): How pipeline progress is communicated
  • ADR-0024 (Full Mode): Always-on quality settings for demo presentation
  • ADR-0025 (Novation UI): Document-style Q&A thread matching ZOL website design
  • ADR-0027 (Multilingual): Cross-lingual RAG with 8-language support

Infrastructure

  • ADR-0003 (Incremental Crawling): Architecture for incremental content crawling
  • ADR-0006 (Fuzzy Matching): How entity variations are resolved
  • ADR-0011 (Module Inlining): How the codebase is organized
  • ADR-0013 (Token Budget): How reasoning model token limits are managed
  • ADR-0023 (Prompt Caching): Deferred — OpenAI automatic caching sufficient at current scale
  • ADR-0029 (Remove Graphiti): Direct Neo4j driver replacing Graphiti library (Neo4j fully removed March 2026, replaced by PostgreSQL taxonomy)
  • ADR-0031 (Semantic Cache): Two-tier PostgreSQL + pgvector cache on reformulated queries
  • ADR-0034 (Latency Optimization): Jina Reranker v2, reduced candidates, latency improvements

Safety and Security

  • ADR-0036 (Adversarial Hardening): Perplexity-based anomaly detection, burst rate limiting, streaming retraction

Voice Channel

  • ADR-0048 (OpenAI Embeddings Migration): voice-channel latency forced the move from on-prem Ollama bge-m3 to OpenAI text-embedding-3-large
  • ADR-0049 (Thin Voice Architecture, superseded): collapsed an 8-stage voice pipeline to regex pre-filter → FAQ → RAG; subsequently superseded by ADR-0051
  • ADR-0050 (master record): Twilio Elastic SIP into self-hosted LiveKit SIP for PSTN voice termination
  • ADR-0051 (Agentic-only Voice Orchestrator): GPT-4.1 with tool use becomes the only voice path; deterministic thin pipeline deleted
  • ADR-0052 (Voice Language Locking): voice channel locks language at first utterance; mid-call switching unsupported (Deepgram single-mode silence)

Knowledge Representation

  • ADR-0053 (master record): Neo4j fully removed; replaced by app.taxonomy_* tables in PostgreSQL — completes the second half of ADR-0029

Deferred Investigations

  • ADR-0021 (Self-RAG): Multi-pass generation with self-reflection — deferred due to latency
  • ADR-0022 (Dynamic Retrieval): Mid-generation retrieval (FLARE/DRAGIN) — deferred for short-form answers