Release Notes: March 28-31, 2026

88 commits | 65 code files | ~4,400 lines of production code | 7 database migrations

2-Minute Summary

This sprint transformed the system from a ZOL-specific prototype into a hospital-agnostic platform ready for pilot testing. The taxonomy was deduplicated from 12,997 to 2,663 entities, a new AI-powered feedback investigation dashboard was built, PDF ingestion was hardened against crashes, and the eval score held at 99.0% (effective 99.7%) throughout. Seven database migrations (054-060) landed, and every component was verified on the pilot server.

Pilot Readiness Assessment

Criterion	Status	Evidence
Query accuracy	Pass	99.0% (296/299), effective 99.7% with ground truth fixes
Safety layer	Pass	Zero medical advice incidents across all eval runs
All queries execute	Pass	Query decomposition crash (KeyError) fixed and deployed
Hospital-agnostic	Pass	All 259 ZOL-specific references removed; config-driven
Data integrity	Pass	0 orphaned chunks, 0 orphaned embeddings, FK cascades verified
Infrastructure health	Pass	All 7 containers healthy, migrations at head (060)
PDF handling	Pass	Subprocess isolation prevents OOM worker crashes
Feedback tooling	Pass	AI investigation, override, flag, golden question promotion

Verdict: The system is ready for pilot testing.

Detailed Changes

1. Hospital-Agnostic Architecture (Phases 1-4)

The single largest workstream: converting the entire codebase from hardcoded ZOL references to a database-driven, hospital-agnostic platform.

What changed:

Audit: Identified 259 ZOL-specific references across 40 files
Phase 1 — Config Extraction: New site_crawl_configs table (migration 058) and admin API for per-hospital crawl settings
Phase 2 — Prompt Parameterization: All LLM prompts now receive hospital identity via PromptContext dataclass loaded from the database at runtime
Phase 3 — Generic Naming: ZOLCrawler renamed to HospitalCrawler; all ZOL-branded strings removed from API titles, app descriptions, and defaults
Phase 4 — DB-driven Config Cache: SiteConfigCache loads hospital identity, boilerplate patterns, and crawl settings from the database on startup; no more in-code constants

Impact: A new hospital can be onboarded by inserting configuration rows — zero code changes required.

Key files:

backend/app/services/site_config.py — DB-backed config cache
backend/app/crawlers/hospital_crawler.py — generic crawler (was zol_crawler.py)
backend/app/prompts.py — parameterized prompt templates
backend/app/api/hospital_config.py — admin CRUD for crawl configs

2. Taxonomy Deduplication & SNOMED Gap Fill

The taxonomy had grown to 12,997 entities with massive duplication from multiple extraction runs. This sprint cleaned and enriched it.

What changed:

Dedup (migration 056): Survivor selection algorithm kept the richest entity per group; reduced to 2,663 unique entities
SNOMED Gap Fill (migration 057): 1,674 orphaned entities were linked via SNOMED hierarchy lookups + manual seed fallback
LLM Auto-linker: New relationship_autolinker.py uses GPT-4.1-mini to classify and link remaining orphans during the publish pipeline
Result: 3,591 relationships, only 43 orphans remaining (2.1%)

Taxonomy before/after:

Metric	Before	After
Entities	12,997	2,663
Relationships	~2,000	3,591
Orphans	~1,674	43 (2.1%)
Duplicates	Severe	Eliminated

Key files:

backend/alembic/versions/056_dedup_published_entities.py
backend/alembic/versions/057_snomed_relationship_gap_fill.py
backend/app/services/taxonomy/relationship_autolinker.py
backend/app/services/taxonomy/dedup_published.py

3. Feedback Investigation Dashboard

A new AI-powered system for analysing negative user feedback and improving answer quality.

What changed:

AI Case Investigation: Click any feedback item to trigger a GPT-4.1 analysis that diagnoses why the answer was wrong, identifies missing chunks, and suggests fixes
Override Mechanism: Admin can force a response override (correct answer, source citations) that is served to future identical queries
Add to Golden Questions: Promote investigated questions directly into the evaluation benchmark
Dashboard Metrics (Spec B): Telemetry stats with P95 latency comparison, Think Harder funnel visualization, trend chart
Flag & Persist: Flag content for review; investigation results persist across page refreshes via backend storage

Key files:

backend/app/services/feedback_investigation_service.py
backend/app/api/admin_feedback.py — 5 new endpoints
frontend/src/pages/FeedbackDashboardPage.tsx — complete redesign

4. PDF & Document Pipeline Hardening

573 PDF brochures were ingested during this sprint, revealing and fixing several crash patterns.

What changed:

Subprocess Isolation: PDF extraction now runs in a forked subprocess; if it OOMs or hangs, only the subprocess dies — the worker survives
Image-only PDF Detection: PDFs with no extractable text are gracefully skipped instead of crashing
Boilerplate Stripping: Hospital header/footer patterns (phone numbers, addresses) are stripped from chunks; patterns are DB-configurable per hospital
Enrichment Retry: Gaps in contextual embeddings are retried inline before marking a document as completed
Self-heal Purge: Soft-deleted documents are now cleaned up during the self-heal diagnostic cycle

Key files:

backend/app/services/document_service.py
backend/app/services/processing_service.py
backend/app/services/diagnostics/self_heal_service.py

5. RAG Pipeline Improvements

Several retrieval quality improvements targeting navigational and practical queries.

What changed:

Category-Aware Retrieval Boosting: Navigational queries (navigation_or_practical_info) get a 1.5x authority boost for chunks in relevant categories (Location, Contact, Financial, etc.) and a 0.7x penalty for unrelated categories
Taxonomy Enrichment for Navigation: Practical queries now trigger taxonomy lookups (campus info, department details) even when no medical entity is detected
Campus-Aware Doctor Lookup: Doctor queries with campus mentions now filter by campus via published taxonomy relationships
Speculative Retrieval Merge: When intent classification reformulates a query, results from both original and reformulated queries are merged using deduplication
Query Decomposition Fix: Fixed the KeyError: '"multi_hop"' crash caused by double f-string/format escaping — was blocking all complex queries on pilot

Key files:

backend/app/services/search_service.py — category boosting
backend/app/services/taxonomy/query_service.py — campus-aware lookups
backend/app/services/rag/retrieval_mixin.py — speculative merge
backend/app/services/query_decomposition_service.py — prompt fix

6. Entity Resolution UI

Merge candidate management improvements for the taxonomy pipeline wizard.

What changed:

Merge/Reject buttons added to NEEDS_REVIEW candidates (previously only visible for AUTO_MERGE)
Tiered Bulk Merge: One-click approval for high-confidence candidates (100% token overlap → 80%+)
SNOMED Bulk Merge: Now includes NEEDS_REVIEW candidates, not just AUTO_MERGE
No-confirmation individual merge: Single-click merge for reviewed candidates

7. Data Integrity & Migrations

Seven database migrations ensuring referential integrity and clean data.

Migration	Purpose
054	`merge_candidates` table for fuzzy entity dedup
055	`feedback_investigations` table + override columns
056	Deduplicate published entities (12,997 → 2,663)
057	SNOMED relationship gap fill + manual seeds
058	`site_crawl_configs` for hospital-agnostic crawl config
059	Seed `golden_pages` with ZOL navigational pages
060	Change `ingestion_results` FK from SET NULL to CASCADE

Data cleanup performed:

5,732 orphaned ingestion results deleted
12 zombie documents (completed, 0 chunks) removed
15 NULL-document_id ingestion results cleaned

8. Evaluation Results

Date	Score	Context
March 27	99.7% (298/299)	RAG mixin split, dedup, all fixes
March 29	98.7% (295/299)	PDF corpus scaling incident (-1.0%)
March 30	97.7% (293/299)	Post-gap-fill, taxonomy in flux
March 31	99.0% (296/299)	Taxonomy dedup + gap fill + Graph ON

The 3 remaining failures:

GQ-195: Non-deterministic (buikpijn routes to Abdominale Heelkunde vs Kindergeneeskunde depending on context) — needs pediatric keyword boosting
GQ-043, GQ-124: Ground truth corrections applied; effective score with fixes: 99.7%

9. Documentation & Tooling

Architecture-as-Code: New Docusaurus plugin that generates architecture index from frontmatter metadata
Multi-tenancy docs: Comprehensive page covering the hospital-agnostic design
Taxonomy dedup/gap-fill docs: Technical deep-dive into the dedup algorithm and SNOMED gap fill
Feedback dashboard metrics docs: Dashboard feature documentation
PDF corpus scaling incident: Academic analysis of the 1% eval regression from PDF ingestion
Prompt engineering docs: New page covering the prompt architecture
45+ updated documentation pages across all sections

Current System State

Component	Value
Documents	2,522 completed
Chunks	10,437 (all with embeddings)
Taxonomy entities	2,663 (deduplicated)
Taxonomy relationships	3,591
Orphan rate	2.1% (43 entities)
Golden questions	302 (v3.6)
Database migrations	Head at 060
Eval score	99.0% (effective 99.7%)
Containers	7/7 healthy
Medical advice incidents	ZERO

Pilot Readiness Assessment​

Detailed Changes​

1. Hospital-Agnostic Architecture (Phases 1-4)​

2. Taxonomy Deduplication & SNOMED Gap Fill​

3. Feedback Investigation Dashboard​

4. PDF & Document Pipeline Hardening​

5. RAG Pipeline Improvements​

6. Entity Resolution UI​

7. Data Integrity & Migrations​

8. Evaluation Results​

9. Documentation & Tooling​

Current System State​

Pilot Readiness Assessment

Detailed Changes

1. Hospital-Agnostic Architecture (Phases 1-4)

2. Taxonomy Deduplication & SNOMED Gap Fill

3. Feedback Investigation Dashboard

4. PDF & Document Pipeline Hardening

5. RAG Pipeline Improvements

6. Entity Resolution UI

7. Data Integrity & Migrations

8. Evaluation Results

9. Documentation & Tooling

Current System State