Release Notes: March 21-27, 2026
~125 commits | RAG service refactored | C901 violations: 27 to 0 | Eval: 99.7% (298/299)
The RAG service was split from a monolithic 5,176-line file into 7 focused mixin modules while maintaining 99.7% eval accuracy. All 27 C901 complexity violations were eliminated. A 3-layer content deduplication system was introduced, hub detection was overhauled with medical vocabulary scoring, and the feedback investigation system gained AI-powered case analysis. Code review remediation fixed 13 ::jsonb violations and added 18 unit tests for Keycloak integration.
1. RAG Service Refactoring
The core rag_service.py had grown to 5,176 lines with 27 C901 (cyclomatic complexity) violations. This week it was decomposed into a clean mixin architecture.
What changed:
- Mixin Split:
rag_service.pybroken into 7 modules — retrieval, ranking, synthesis, safety, taxonomy, caching, and telemetry - Line Reduction: 5,176 lines down to 2,951 lines across all modules; the main orchestrator file is now under 400 lines
- C901 Elimination: All 27 complexity violations removed (27 to 0) through method extraction and responsibility separation
- Eval Verification: Full golden eval run after refactoring confirmed 99.7% (298/299) — the best score achieved to date on fully refactored code
Key files:
backend/app/services/rag/retrieval_mixin.py— vector search, speculative retrievalbackend/app/services/rag/ranking_mixin.py— chunk scoring and rerankingbackend/app/services/rag/synthesis_mixin.py— LLM response generationbackend/app/services/rag/safety_mixin.py— guardrails and medical advice filteringbackend/app/services/rag/taxonomy_mixin.py— entity-aware retrieval enrichmentbackend/app/services/rag/caching_mixin.py— Redis-backed query cachebackend/app/services/rag/telemetry_mixin.py— latency tracking, token usage
2. Code Review Remediation
A systematic pass through the codebase to address findings from the code review audit.
What changed:
- 13
::jsonbviolations fixed: All PostgreSQL cast expressions converted from::jsonb(incompatible with asyncpg) toCAST(:param AS jsonb) .env.exampleupdated: 20+ new settings documented for onboarding new developers- Keycloak configuration: Theme mount added to Docker Compose; realm export includes all required client scopes and role mappings
- 18 unit tests: New test suite for
KeycloakAdminServicecovering user creation, role assignment, token exchange, and error handling
3. Content Deduplication
A 3-layer deduplication system to prevent duplicate content from polluting retrieval results.
What changed:
- Layer 1 — Canonical URLs: URL normalization (trailing slashes, query params, fragments) ensures the same page is never ingested twice
- Layer 2 — Cross-document Chunk Dedup: Identical or near-identical chunks across different documents are detected and consolidated during the publish step
- Layer 3 — Boilerplate Detection: Site-agnostic boilerplate patterns (headers, footers, cookie banners) are identified and stripped; patterns are stored in the database and configurable per hospital
Impact: Cleaner chunk pool means fewer duplicate results and better use of the LLM context window during synthesis.
4. Hub Detection Overhaul
Hub pages (department landing pages, service overviews) need special treatment during crawling — they should be crawled for links but their own content is often too generic to chunk.
What changed:
- Medical Vocabulary Scoring: Comprehensive Dutch medical term dictionary used alongside structural scoring (link density, heading-to-content ratio) to classify pages as hubs
- Incremental Extraction: Hub pages are now processed incrementally — their links are followed but content is only extracted if the page also has substantive medical text
- Hub Demoted Candidates UI: Admin interface shows pages that were demoted from hub status, allowing manual override
- Homepage Link Discovery: The crawler always processes homepage links to discover main navigation targets, regardless of hub classification
- DB-driven Site Config: All crawl behavior (max depth, hub thresholds, allowed domains) is stored in the
site_crawl_configstable for multi-tenant support
5. AI Feedback Investigation System
A new workflow for diagnosing and resolving negative user feedback using LLM-powered case analysis.
What changed:
- AI Case Investigation: Clicking a negative feedback item triggers a GPT-4.1 analysis that examines the original query, retrieved chunks, generated response, and user complaint to produce a diagnosis
- Override Mechanism: Admins can attach a corrected response to any query — future identical queries will serve the override instead of generating a new response
- Add to Golden Questions: Investigated queries can be promoted directly into the golden evaluation set with the correct expected answer
- Dashboard Redesign: Feedback dashboard rebuilt with metrics cards, chunk inspection panel, and trend chart showing feedback sentiment over time
- Telemetry Integration: P95 latency stats displayed per query, enabling correlation between slow responses and negative feedback
- Persistence: Investigation results are stored in the backend and survive page refreshes
Key files:
backend/app/services/feedback_investigation_service.pybackend/app/api/admin_feedback.pyfrontend/src/pages/FeedbackDashboardPage.tsx
6. PDF Hardening
Early PDF ingestion work that was further hardened in the following week.
What changed:
- Subprocess Isolation: PDF text extraction runs in a forked subprocess — if the PDF causes an OOM or infinite loop, the worker process survives
- Image-only PDF Detection: PDFs that contain only scanned images (no extractable text) are detected and gracefully skipped with a warning, rather than producing empty chunks
- Enrichment Gap Retry: Before marking a document as completed, the pipeline retries any chunks that failed contextual embedding enrichment
7. Crawl & Pipeline Improvements
What changed:
- Crawl History: Full history of crawl runs (start time, pages found, pages ingested, errors) stored and viewable in the admin UI
- Pending Deletion Management: Documents marked for deletion are tracked and can be bulk-purged
- Self-heal Purge: The diagnostic self-heal cycle now cleans up soft-deleted documents automatically
- Pipeline Configuration UI: New section in Hospital Management for configuring pipeline parameters (chunk size, overlap, embedding model) per hospital
8. Development Methodology
What changed:
- Methodology Port: 10 development methodology items ported from TrustRelay, including subagent-driven development, design-before-code, and verification-before-completion
- Claude Hooks: Pre-commit hooks added for automated linting (
ruff check,ruff format) with a stop gate that blocks commits if lint errors are present
Evaluation Results
| Date | Score | Context |
|---|---|---|
| March 21 | 95.1% | Baseline before refactoring |
| March 24 | 97.3% | Mid-refactoring, retrieval improvements |
| March 27 | 99.7% (298/299) | All mixin refactoring complete, best score ever |
The single remaining failure (GQ-195) is a non-deterministic routing issue where "buikpijn" (abdominal pain) sometimes resolves to Abdominale Heelkunde instead of Kindergeneeskunde depending on the query context.
System State at End of Week
| Component | Value |
|---|---|
| Documents | ~2,500 completed |
| Chunks | ~10,000 (all with embeddings) |
| RAG service | 7 mixin modules, 0 C901 violations |
| Eval score | 99.7% (298/299) |
::jsonb violations | 0 (was 13) |
| Keycloak tests | 18 new unit tests |
| Medical advice incidents | ZERO |