Release Notes: March 21-27, 2026

~125 commits | RAG service refactored | C901 violations: 27 to 0 | Eval: 99.7% (298/299)

TL;DR

The RAG service was split from a monolithic 5,176-line file into 7 focused mixin modules while maintaining 99.7% eval accuracy. All 27 C901 complexity violations were eliminated. A 3-layer content deduplication system was introduced, hub detection was overhauled with medical vocabulary scoring, and the feedback investigation system gained AI-powered case analysis. Code review remediation fixed 13 ::jsonb violations and added 18 unit tests for Keycloak integration.

1. RAG Service Refactoring

The core rag_service.py had grown to 5,176 lines with 27 C901 (cyclomatic complexity) violations. This week it was decomposed into a clean mixin architecture.

What changed:

Mixin Split: rag_service.py broken into 7 modules — retrieval, ranking, synthesis, safety, taxonomy, caching, and telemetry
Line Reduction: 5,176 lines down to 2,951 lines across all modules; the main orchestrator file is now under 400 lines
C901 Elimination: All 27 complexity violations removed (27 to 0) through method extraction and responsibility separation
Eval Verification: Full golden eval run after refactoring confirmed 99.7% (298/299) — the best score achieved to date on fully refactored code

Key files:

backend/app/services/rag/retrieval_mixin.py — vector search, speculative retrieval
backend/app/services/rag/ranking_mixin.py — chunk scoring and reranking
backend/app/services/rag/synthesis_mixin.py — LLM response generation
backend/app/services/rag/safety_mixin.py — guardrails and medical advice filtering
backend/app/services/rag/taxonomy_mixin.py — entity-aware retrieval enrichment
backend/app/services/rag/caching_mixin.py — Redis-backed query cache
backend/app/services/rag/telemetry_mixin.py — latency tracking, token usage

2. Code Review Remediation

A systematic pass through the codebase to address findings from the code review audit.

What changed:

13 ::jsonb violations fixed: All PostgreSQL cast expressions converted from ::jsonb (incompatible with asyncpg) to CAST(:param AS jsonb)
.env.example updated: 20+ new settings documented for onboarding new developers
Keycloak configuration: Theme mount added to Docker Compose; realm export includes all required client scopes and role mappings
18 unit tests: New test suite for KeycloakAdminService covering user creation, role assignment, token exchange, and error handling

3. Content Deduplication

A 3-layer deduplication system to prevent duplicate content from polluting retrieval results.

What changed:

Layer 1 — Canonical URLs: URL normalization (trailing slashes, query params, fragments) ensures the same page is never ingested twice
Layer 2 — Cross-document Chunk Dedup: Identical or near-identical chunks across different documents are detected and consolidated during the publish step
Layer 3 — Boilerplate Detection: Site-agnostic boilerplate patterns (headers, footers, cookie banners) are identified and stripped; patterns are stored in the database and configurable per hospital

Impact: Cleaner chunk pool means fewer duplicate results and better use of the LLM context window during synthesis.

4. Hub Detection Overhaul

Hub pages (department landing pages, service overviews) need special treatment during crawling — they should be crawled for links but their own content is often too generic to chunk.

What changed:

Medical Vocabulary Scoring: Comprehensive Dutch medical term dictionary used alongside structural scoring (link density, heading-to-content ratio) to classify pages as hubs
Incremental Extraction: Hub pages are now processed incrementally — their links are followed but content is only extracted if the page also has substantive medical text
Hub Demoted Candidates UI: Admin interface shows pages that were demoted from hub status, allowing manual override
Homepage Link Discovery: The crawler always processes homepage links to discover main navigation targets, regardless of hub classification
DB-driven Site Config: All crawl behavior (max depth, hub thresholds, allowed domains) is stored in the site_crawl_configs table for multi-tenant support

5. AI Feedback Investigation System

A new workflow for diagnosing and resolving negative user feedback using LLM-powered case analysis.

What changed:

AI Case Investigation: Clicking a negative feedback item triggers a GPT-4.1 analysis that examines the original query, retrieved chunks, generated response, and user complaint to produce a diagnosis
Override Mechanism: Admins can attach a corrected response to any query — future identical queries will serve the override instead of generating a new response
Add to Golden Questions: Investigated queries can be promoted directly into the golden evaluation set with the correct expected answer
Dashboard Redesign: Feedback dashboard rebuilt with metrics cards, chunk inspection panel, and trend chart showing feedback sentiment over time
Telemetry Integration: P95 latency stats displayed per query, enabling correlation between slow responses and negative feedback
Persistence: Investigation results are stored in the backend and survive page refreshes

Key files:

backend/app/services/feedback_investigation_service.py
backend/app/api/admin_feedback.py
frontend/src/pages/FeedbackDashboardPage.tsx

6. PDF Hardening

Early PDF ingestion work that was further hardened in the following week.

What changed:

Subprocess Isolation: PDF text extraction runs in a forked subprocess — if the PDF causes an OOM or infinite loop, the worker process survives
Image-only PDF Detection: PDFs that contain only scanned images (no extractable text) are detected and gracefully skipped with a warning, rather than producing empty chunks
Enrichment Gap Retry: Before marking a document as completed, the pipeline retries any chunks that failed contextual embedding enrichment

7. Crawl & Pipeline Improvements

What changed:

Crawl History: Full history of crawl runs (start time, pages found, pages ingested, errors) stored and viewable in the admin UI
Pending Deletion Management: Documents marked for deletion are tracked and can be bulk-purged
Self-heal Purge: The diagnostic self-heal cycle now cleans up soft-deleted documents automatically
Pipeline Configuration UI: New section in Hospital Management for configuring pipeline parameters (chunk size, overlap, embedding model) per hospital

8. Development Methodology

What changed:

Methodology Port: 10 development methodology items ported from TrustRelay, including subagent-driven development, design-before-code, and verification-before-completion
Claude Hooks: Pre-commit hooks added for automated linting (ruff check, ruff format) with a stop gate that blocks commits if lint errors are present

Evaluation Results

Date	Score	Context
March 21	95.1%	Baseline before refactoring
March 24	97.3%	Mid-refactoring, retrieval improvements
March 27	99.7% (298/299)	All mixin refactoring complete, best score ever

The single remaining failure (GQ-195) is a non-deterministic routing issue where "buikpijn" (abdominal pain) sometimes resolves to Abdominale Heelkunde instead of Kindergeneeskunde depending on the query context.

System State at End of Week

Component	Value
Documents	~2,500 completed
Chunks	~10,000 (all with embeddings)
RAG service	7 mixin modules, 0 C901 violations
Eval score	99.7% (298/299)
`::jsonb` violations	0 (was 13)
Keycloak tests	18 new unit tests
Medical advice incidents	ZERO

1. RAG Service Refactoring​

2. Code Review Remediation​

3. Content Deduplication​

4. Hub Detection Overhaul​

5. AI Feedback Investigation System​

6. PDF Hardening​

7. Crawl & Pipeline Improvements​

8. Development Methodology​

Evaluation Results​

System State at End of Week​

1. RAG Service Refactoring

2. Code Review Remediation

3. Content Deduplication

4. Hub Detection Overhaul

5. AI Feedback Investigation System

6. PDF Hardening

7. Crawl & Pipeline Improvements

8. Development Methodology

Evaluation Results

System State at End of Week