Skip to main content

ADR-0057: Tenant-Scoped Prompt Addendums + Tenant-Agnostic Doctor-Profile Boost

Date: 2026-05-12 | Status: Accepted (shipped in commit 5c26947d) | Related: ADR-0055 FAQ-Corpus Drift Prevention, ADR-0056 Chat Answer-Shape Typology

Context

On 2026-05-12 the user reported a factual error: when asked "Is er raadpleging voor Dr. Matthias Dupont op woensdag?" our system answered "geen raadpleging op woensdag" while the competitor at zolcase.novation.website correctly answered "Ja, woensdagvoormiddag." The corpus contained the canonical doctor profile page with a markdown schedule table:

| | MA | Di | WO | DO | VR | ZA |
| VM | RP2w | RP2w | RP2w | RP | RP2w | |
| NM | RP | | | | | |

The VM × WO cell is RP2w (bi-weekly consultation) — non-empty — so per the existing TABULAR DATA RULE the answer should be YES. The LLM still inverted the fact.

Two compounding root causes:

  1. Schedule-table parsing failure. The shared TABULAR DATA RULE has no abbreviation legend, no worked example for the ZOL doctor schedule format, and no guidance on reading both VM and NM rows. The LLM was guessing.
  2. Retrieval contamination. Our system cited [1][2][3] — pulling the canonical doctor page, the "Arts Anders" interview profile, and a multi-doctor cardiology page. The competitor cited only [1] (the canonical profile). The thematic co-retrieval diluted the schedule signal in our cited chunks.

Decision

Ship two complementary fixes in one commit, each at the right layer of abstraction.

Layer 1 — Tenant-scoped prompt addendums (NEW pattern)

The doctor-schedule format VM/NM × MA-ZA × RP/RP2w/empty is ZOL-specific — other hospitals format schedules differently. A rule specific to this format should NOT live in the shared _build_rag_system_prompt_template.

  • New constant ZOL_DOCTOR_SCHEDULE_RULE in prompts.py containing the legend + worked example + counter-example.
  • New registry _TENANT_CHAT_ADDENDUMS: dict[str, str] mapping tenant_slug → addendum text. Today: {"zol": ZOL_DOCTOR_SCHEDULE_RULE}. Tomorrow: new tenants register their own entries without touching the shared template.
  • New helper get_tenant_chat_addendum(slug) -> str called at the chat-channel system-prompt assembly site (rag_service.py:4641) after CHAT_ANSWER_SHAPE_RULES. Voice path does NOT call this helper.

Layer 2 — Tenant-agnostic doctor-profile boost

The "Dr. <Name>" → canonical profile retrieval pattern generalises across every hospital website that follows the convention Dr. <Name> | <Tenant> in document titles.

Three code changes in search_service.py:

  1. Add Document.title to the 3 SQL result-projection paths (vector_search, BM25 _fetch_chunk, _rescue_keyword_search) so the title is available to boost functions.
  2. Add _DOCTOR_NAME_PATTERN regex + _extract_doctor_name(query) classmethod. Matches "Dr. <Name>" / "dr. <Name>" / "Dr <Name>" across nl/en/fr/it. Captures 1-3 capitalised name parts with accent-mark tolerance.
  3. Add _boost_doctor_profile(result, doctor_name) method that multiplies chunk score by 1.50× when the chunk's document title starts with dr. <name>. Wired into _apply_metadata_boosts.

Boost factor calibration: 1.50 was chosen because:

  • Strong enough to pull the canonical profile from rank 2-3 to rank 1.
  • Not so strong it crushes other signals (campus boost is 1.10, conversation-context boost is 1.40, authority dampening can be 0.85).
  • The same band as the pediatric child-prefix booster pattern.

The architectural distinction

This ADR codifies a recurring decision: when a fix is corpus-format-specific, isolate it to the affected tenant; when a fix is convention-specific, generalize it across all tenants.

Fix shapeIsolation surfaceExample
Tenant-specific data format (table layout, abbreviation system, brand strings)_TENANT_CHAT_ADDENDUMS[slug]ZOL schedule table
Universal naming convention (doctor profile titles, common URL patterns)_boost_* method, tenant-agnosticDr. X title-prefix boost
Universal answer shape (procedure definition, comparison, decision tree)CHAT_ANSWER_SHAPE_RULESADR-0056
Tenant-specific FAQ / canned answerYAML in tenant overlayADR-0055 surviving entries

The principle: the layer of the system that owns the fix should be the SAME layer that owns the convention being matched. Hospital-specific data → tenant overlay. Universal convention → shared code.

Consequences

Positive

  • No whack-a-mole for ZOL. Every future ZOL-specific format quirk (campus abbreviations, brand phrasings) lands in ZOL_DOCTOR_SCHEDULE_RULE and its siblings under the same registry.
  • Zero impact on other tenants. A new hospital onboarded tomorrow does NOT inherit ZOL's schedule-table rule. Their tenant slug defaults to empty addendum unless they register one.
  • Doctor-profile boost generalises. Any tenant whose doctor pages follow Dr. <Name> | <Tenant> titles gets the boost for free.

Negative / trade-offs

  • Slightly larger chat prompt for ZOL. ZOL_DOCTOR_SCHEDULE_RULE adds ~30 lines / ~300 tokens per chat query. ~$0.0003 at GPT-4.1-mini pricing. Negligible.
  • Boost can fire on non-doctor queries that mention "Dr. <Name>". Mitigation: the boost only matters if a doc with that title exists in the corpus; otherwise it's a no-op.

Rejected alternatives

  1. Modify the shared _build_rag_system_prompt_template to include the ZOL schedule rule. Rejected — exactly the pollution we're avoiding. The shared template must stay hospital-agnostic.
  2. Add the doctor boost as a YAML rule per tenant. Rejected — the title convention Dr. <Name> is shared across hospitals; coding it once in Python avoids 10 copies of the same regex in 10 YAML files.
  3. Re-rank in a post-retrieval LLM step instead of a boost. Rejected — adds latency for what is essentially a deterministic title-match.
  4. Re-chunk the corpus to extract doctor schedules as structured data. Deferred to a future iteration. Layer 2 is needed regardless; the prompt rule provides immediate value while the schema work matures.

Follow-ups

  • Companion landing: extract_consultation_schedule() in data_quality.py (commit 7f2789c5) extracts the schedule into structured JSON at doc_metadata.consultation_schedule. The prompt rule remains as defense-in-depth — it teaches the LLM to parse markdown when consumers haven't been upgraded to consume the structured field yet.
  • Re-evaluate the 1.50 boost factor after a week of production data.

References