ADR-0057: Tenant-Scoped Prompt Addendums + Tenant-Agnostic Doctor-Profile Boost
Date: 2026-05-12 | Status: Accepted (shipped in commit 5c26947d) | Related: ADR-0055 FAQ-Corpus Drift Prevention, ADR-0056 Chat Answer-Shape Typology
Context
On 2026-05-12 the user reported a factual error: when asked "Is er raadpleging voor Dr. Matthias Dupont op woensdag?" our system answered "geen raadpleging op woensdag" while the competitor at zolcase.novation.website correctly answered "Ja, woensdagvoormiddag." The corpus contained the canonical doctor profile page with a markdown schedule table:
| | MA | Di | WO | DO | VR | ZA |
| VM | RP2w | RP2w | RP2w | RP | RP2w | |
| NM | RP | | | | | |
The VM × WO cell is RP2w (bi-weekly consultation) — non-empty — so per the existing TABULAR DATA RULE the answer should be YES. The LLM still inverted the fact.
Two compounding root causes:
- Schedule-table parsing failure. The shared TABULAR DATA RULE has no abbreviation legend, no worked example for the ZOL doctor schedule format, and no guidance on reading both VM and NM rows. The LLM was guessing.
- Retrieval contamination. Our system cited
[1][2][3]— pulling the canonical doctor page, the "Arts Anders" interview profile, and a multi-doctor cardiology page. The competitor cited only[1](the canonical profile). The thematic co-retrieval diluted the schedule signal in our cited chunks.
Decision
Ship two complementary fixes in one commit, each at the right layer of abstraction.
Layer 1 — Tenant-scoped prompt addendums (NEW pattern)
The doctor-schedule format VM/NM × MA-ZA × RP/RP2w/empty is ZOL-specific — other hospitals format schedules differently. A rule specific to this format should NOT live in the shared _build_rag_system_prompt_template.
- New constant
ZOL_DOCTOR_SCHEDULE_RULEinprompts.pycontaining the legend + worked example + counter-example. - New registry
_TENANT_CHAT_ADDENDUMS: dict[str, str]mapping tenant_slug → addendum text. Today:{"zol": ZOL_DOCTOR_SCHEDULE_RULE}. Tomorrow: new tenants register their own entries without touching the shared template. - New helper
get_tenant_chat_addendum(slug) -> strcalled at the chat-channel system-prompt assembly site (rag_service.py:4641) afterCHAT_ANSWER_SHAPE_RULES. Voice path does NOT call this helper.
Layer 2 — Tenant-agnostic doctor-profile boost
The "Dr. <Name>" → canonical profile retrieval pattern generalises across every hospital website that follows the convention Dr. <Name> | <Tenant> in document titles.
Three code changes in search_service.py:
- Add
Document.titleto the 3 SQL result-projection paths (vector_search, BM25_fetch_chunk,_rescue_keyword_search) so the title is available to boost functions. - Add
_DOCTOR_NAME_PATTERNregex +_extract_doctor_name(query)classmethod. Matches "Dr. <Name>" / "dr. <Name>" / "Dr <Name>" across nl/en/fr/it. Captures 1-3 capitalised name parts with accent-mark tolerance. - Add
_boost_doctor_profile(result, doctor_name)method that multiplies chunk score by 1.50× when the chunk's document title starts withdr. <name>. Wired into_apply_metadata_boosts.
Boost factor calibration: 1.50 was chosen because:
- Strong enough to pull the canonical profile from rank 2-3 to rank 1.
- Not so strong it crushes other signals (campus boost is 1.10, conversation-context boost is 1.40, authority dampening can be 0.85).
- The same band as the pediatric child-prefix booster pattern.
The architectural distinction
This ADR codifies a recurring decision: when a fix is corpus-format-specific, isolate it to the affected tenant; when a fix is convention-specific, generalize it across all tenants.
| Fix shape | Isolation surface | Example |
|---|---|---|
| Tenant-specific data format (table layout, abbreviation system, brand strings) | _TENANT_CHAT_ADDENDUMS[slug] | ZOL schedule table |
| Universal naming convention (doctor profile titles, common URL patterns) | _boost_* method, tenant-agnostic | Dr. X title-prefix boost |
| Universal answer shape (procedure definition, comparison, decision tree) | CHAT_ANSWER_SHAPE_RULES | ADR-0056 |
| Tenant-specific FAQ / canned answer | YAML in tenant overlay | ADR-0055 surviving entries |
The principle: the layer of the system that owns the fix should be the SAME layer that owns the convention being matched. Hospital-specific data → tenant overlay. Universal convention → shared code.
Consequences
Positive
- No whack-a-mole for ZOL. Every future ZOL-specific format quirk (campus abbreviations, brand phrasings) lands in
ZOL_DOCTOR_SCHEDULE_RULEand its siblings under the same registry. - Zero impact on other tenants. A new hospital onboarded tomorrow does NOT inherit ZOL's schedule-table rule. Their tenant slug defaults to empty addendum unless they register one.
- Doctor-profile boost generalises. Any tenant whose doctor pages follow
Dr. <Name> | <Tenant>titles gets the boost for free.
Negative / trade-offs
- Slightly larger chat prompt for ZOL.
ZOL_DOCTOR_SCHEDULE_RULEadds ~30 lines / ~300 tokens per chat query. ~$0.0003 at GPT-4.1-mini pricing. Negligible. - Boost can fire on non-doctor queries that mention "Dr. <Name>". Mitigation: the boost only matters if a doc with that title exists in the corpus; otherwise it's a no-op.
Rejected alternatives
- Modify the shared
_build_rag_system_prompt_templateto include the ZOL schedule rule. Rejected — exactly the pollution we're avoiding. The shared template must stay hospital-agnostic. - Add the doctor boost as a YAML rule per tenant. Rejected — the title convention
Dr. <Name>is shared across hospitals; coding it once in Python avoids 10 copies of the same regex in 10 YAML files. - Re-rank in a post-retrieval LLM step instead of a boost. Rejected — adds latency for what is essentially a deterministic title-match.
- Re-chunk the corpus to extract doctor schedules as structured data. Deferred to a future iteration. Layer 2 is needed regardless; the prompt rule provides immediate value while the schema work matures.
Follow-ups
- Companion landing:
extract_consultation_schedule()indata_quality.py(commit7f2789c5) extracts the schedule into structured JSON atdoc_metadata.consultation_schedule. The prompt rule remains as defense-in-depth — it teaches the LLM to parse markdown when consumers haven't been upgraded to consume the structured field yet. - Re-evaluate the 1.50 boost factor after a week of production data.
References
- Commit
5c26947d— Layer 1 + Layer 2 deploy - ADR-0055 — precedent for hospital-agnostic vs tenant-specific framing
- ADR-0056 — precedent for typology-beats-rule-per-defect
- Master ADR:
docs/ADR/0057-tenant-prompt-isolation-and-doctor-profile-boost.md