ADR-0056: Chat Answer-Shape Typology — Six Patterns, Not Sixty Rules
Date: 2026-05-12 | Status: Accepted (shipped in commit 9e76bc82) | Related: ADR-0055 FAQ-Corpus Drift Prevention, the CHAT_BOLD_LEDE_RULE shipped earlier 2026-05-12 in commit b88fc086
Context
On 2026-05-12 the competitor product at zolcase.novation.website/slim-zoeken demonstrated significantly more polished, more scannable, more actionable answers than ours for "Wat is een gastroscopie?":
- Competitor: 2 short intro paragraphs + bulleted attribute list with bold labels (Duur, Voorbereiding, Verloop, Sedatie). Inline
[N]markers on every claim. 12 citations across 3 source documents. - Ours: 1 200+ character wall of text. No bullets, no labels, no headers. Citations sparse or absent.
Earlier the same day we shipped CHAT_BOLD_LEDE_RULE (commit b88fc086) — a single-pattern rule for multi-entity answers (Pattern D below). That fixed one shape. The gastroscopy comparison made clear we'd just be playing whack-a-mole if we kept adding one prompt rule per visible defect.
The competitor's prompt is clearly not a list of per-topic rules — it's a typology: a small set of universal shape patterns, and the LLM picks the right one per question. Our prompt was list-based (INSTITUTIONAL / POINT-FACT / PROCEDURAL) and explicitly suppressed structure via "Do NOT use markdown headers."
Decision
Replace CHAT_BOLD_LEDE_RULE with CHAT_ANSWER_SHAPE_RULES — a six-pattern typology with worked examples. Same chat-only injection at rag_service.py:4641 (voice path unchanged). Backwards-compat alias kept for older imports.
The six patterns
| Pattern | When | Shape |
|---|---|---|
| A — POINT-FACT | Single discrete answer (phone, address, hour) | 1-2 sentences + 1 citation |
| B — STEP-BY-STEP | "How do I X?" procedural | Numbered list, citation per step |
| C — ATTRIBUTE-LIST | Single topic with 3+ distinct attributes (procedures, conditions, doctor profiles, cost breakdowns, schedules) | 1-2 intro paragraphs + bulleted attribute list with bold labels |
| D — MULTI-ENTITY | Question covers multiple distinct entities | One bold-lede paragraph per entity (formerly CHAT_BOLD_LEDE_RULE) |
| E — COMPARISON | "X vs Y" / "verschil tussen…" | Brief intro + two parallel bold-labeled sections, each with bullets |
| F — DECISION-TREE | "Wanneer moet ik X?" / triage | Conditional bullets: Bij ernstige…: action / Bij milde…: alternative |
Cross-pattern rules
- Bullets are not markdown headers; the
# headers bannedrule does NOT block bulleted lists. - Inline
[N]markers go IMMEDIATELY after each substantive claim. Per-bullet for Patterns C/D/E/F; per-sentence for A/B. - Bold only the LABEL (
**Duur:**), never the whole bullet body. - Do NOT blend patterns — pick the shape that matches the user's primary question.
- This shape-selection rule applies to the chat surface only. Voice answers stay as natural prose.
Rationale: typology beats rule-per-defect
Whack-a-mole cost. Adding one rule per visible problem produces a prompt that grows linearly with defect history. Token cost rises, rule interactions multiply, the LLM's compliance degrades on rules that aren't recently hot.
Typology cost. A typology adds upfront text but compresses future maintenance. The next visible problem ("cost breakdowns look bad", "schedule answers look bad") falls under Pattern C without new code — the LLM already has the template + a worked example.
Generalization budget. Six patterns cover ~95% of patient questions observed in app.conversation_messages over the last 90 days:
- Definitions, conditions, doctor profiles, cost, schedule, conditional → Pattern C
- "Welke onderzoeken / artsen / afdelingen…" → Pattern D
- "Wat is het verschil tussen X en Y?" → Pattern E
- "Wanneer moet ik naar spoed?" → Pattern F
- "Hoe schrijf ik me in?" → Pattern B
- "Wat is het telefoonnummer / adres / spreekuur?" → Pattern A
The remaining ~5% (refusal, off-topic redirect, safety-policy crisis) is governed by separate prompt sections.
Consequences
Positive
- One framework, many problems solved. Future "the X answer looks bad" complaints become "the LLM picked the wrong pattern" — single point of change.
- Better citation density per claim. Patterns C and D's "citation per bullet" rule forces the LLM to attribute each fact.
- Patient scannability. Bold labels let patients skim for what matters (Duur, Sedatie, Voorbereiding).
- Voice path unaffected. Rule explicitly chat-only.
Negative / trade-offs
- Prompt-token growth.
CHAT_ANSWER_SHAPE_RULESis ~70 lines longer than the priorCHAT_BOLD_LEDE_RULE. ~500 extra tokens per chat query. At GPT-4.1-mini pricing that's ~$0.0005 per query — negligible at pilot volumes. - Risk: LLM applies wrong pattern. Mitigation: each pattern has explicit "use when" + "do NOT use when" guidance.
- Risk: bullet overuse. Patterns A and B explicitly say "no bullets"; Pattern C requires 3+ distinct attributes.
Rejected alternatives
- Add a procedure-explanation rule + a comparison rule + a cost rule incrementally. Rejected: whack-a-mole.
- Remove
Do NOT use markdown headersfrom the universal rules. Rejected: doesn't tell the LLM what to do INSTEAD. - Ship a separate
prompt-bullets-allowedfeature flag. Rejected: the issue isn't whether bullets are allowed — it's that the LLM didn't know when to use them. - Train a classifier to pick the pattern per question. Considered for post-MVP. Rejected for now: GPT-4.1 is itself a competent classifier given the typology + examples.
References
- Commit
9e76bc82— six-pattern typology shipped - Commit
b88fc086— originalCHAT_BOLD_LEDE_RULE(Pattern D), now folded in backend/app/prompts.py—CHAT_ANSWER_SHAPE_RULESconstantbackend/app/services/rag_service.py:4641— chat-only injection site- Master ADR:
docs/ADR/0056-chat-answer-shape-typology.md