ADR-0056: Chat Answer-Shape Typology — Six Patterns, Not Sixty Rules

Date: 2026-05-12 | Status: Accepted (shipped in commit 9e76bc82) | Related: ADR-0055 FAQ-Corpus Drift Prevention, the CHAT_BOLD_LEDE_RULE shipped earlier 2026-05-12 in commit b88fc086

Context

On 2026-05-12 the competitor product at zolcase.novation.website/slim-zoeken demonstrated significantly more polished, more scannable, more actionable answers than ours for "Wat is een gastroscopie?":

Competitor: 2 short intro paragraphs + bulleted attribute list with bold labels (Duur, Voorbereiding, Verloop, Sedatie). Inline [N] markers on every claim. 12 citations across 3 source documents.
Ours: 1 200+ character wall of text. No bullets, no labels, no headers. Citations sparse or absent.

Earlier the same day we shipped CHAT_BOLD_LEDE_RULE (commit b88fc086) — a single-pattern rule for multi-entity answers (Pattern D below). That fixed one shape. The gastroscopy comparison made clear we'd just be playing whack-a-mole if we kept adding one prompt rule per visible defect.

The competitor's prompt is clearly not a list of per-topic rules — it's a typology: a small set of universal shape patterns, and the LLM picks the right one per question. Our prompt was list-based (INSTITUTIONAL / POINT-FACT / PROCEDURAL) and explicitly suppressed structure via "Do NOT use markdown headers."

Decision

Replace CHAT_BOLD_LEDE_RULE with CHAT_ANSWER_SHAPE_RULES — a six-pattern typology with worked examples. Same chat-only injection at rag_service.py:4641 (voice path unchanged). Backwards-compat alias kept for older imports.

The six patterns

Pattern	When	Shape
A — POINT-FACT	Single discrete answer (phone, address, hour)	1-2 sentences + 1 citation
B — STEP-BY-STEP	"How do I X?" procedural	Numbered list, citation per step
C — ATTRIBUTE-LIST	Single topic with 3+ distinct attributes (procedures, conditions, doctor profiles, cost breakdowns, schedules)	1-2 intro paragraphs + bulleted attribute list with bold labels
D — MULTI-ENTITY	Question covers multiple distinct entities	One bold-lede paragraph per entity (formerly `CHAT_BOLD_LEDE_RULE`)
E — COMPARISON	"X vs Y" / "verschil tussen…"	Brief intro + two parallel bold-labeled sections, each with bullets
F — DECISION-TREE	"Wanneer moet ik X?" / triage	Conditional bullets: Bij ernstige…: action / Bij milde…: alternative

Cross-pattern rules

Bullets are not markdown headers; the # headers banned rule does NOT block bulleted lists.
Inline [N] markers go IMMEDIATELY after each substantive claim. Per-bullet for Patterns C/D/E/F; per-sentence for A/B.
Bold only the LABEL (**Duur:**), never the whole bullet body.
Do NOT blend patterns — pick the shape that matches the user's primary question.
This shape-selection rule applies to the chat surface only. Voice answers stay as natural prose.

Rationale: typology beats rule-per-defect

Whack-a-mole cost. Adding one rule per visible problem produces a prompt that grows linearly with defect history. Token cost rises, rule interactions multiply, the LLM's compliance degrades on rules that aren't recently hot.

Typology cost. A typology adds upfront text but compresses future maintenance. The next visible problem ("cost breakdowns look bad", "schedule answers look bad") falls under Pattern C without new code — the LLM already has the template + a worked example.

Generalization budget. Six patterns cover ~95% of patient questions observed in app.conversation_messages over the last 90 days:

Definitions, conditions, doctor profiles, cost, schedule, conditional → Pattern C
"Welke onderzoeken / artsen / afdelingen…" → Pattern D
"Wat is het verschil tussen X en Y?" → Pattern E
"Wanneer moet ik naar spoed?" → Pattern F
"Hoe schrijf ik me in?" → Pattern B
"Wat is het telefoonnummer / adres / spreekuur?" → Pattern A

The remaining ~5% (refusal, off-topic redirect, safety-policy crisis) is governed by separate prompt sections.

Consequences

Positive

One framework, many problems solved. Future "the X answer looks bad" complaints become "the LLM picked the wrong pattern" — single point of change.
Better citation density per claim. Patterns C and D's "citation per bullet" rule forces the LLM to attribute each fact.
Patient scannability. Bold labels let patients skim for what matters (Duur, Sedatie, Voorbereiding).
Voice path unaffected. Rule explicitly chat-only.

Negative / trade-offs

Prompt-token growth. CHAT_ANSWER_SHAPE_RULES is ~70 lines longer than the prior CHAT_BOLD_LEDE_RULE. ~500 extra tokens per chat query. At GPT-4.1-mini pricing that's ~$0.0005 per query — negligible at pilot volumes.
Risk: LLM applies wrong pattern. Mitigation: each pattern has explicit "use when" + "do NOT use when" guidance.
Risk: bullet overuse. Patterns A and B explicitly say "no bullets"; Pattern C requires 3+ distinct attributes.

Rejected alternatives

Add a procedure-explanation rule + a comparison rule + a cost rule incrementally. Rejected: whack-a-mole.
Remove Do NOT use markdown headers from the universal rules. Rejected: doesn't tell the LLM what to do INSTEAD.
Ship a separate prompt-bullets-allowed feature flag. Rejected: the issue isn't whether bullets are allowed — it's that the LLM didn't know when to use them.
Train a classifier to pick the pattern per question. Considered for post-MVP. Rejected for now: GPT-4.1 is itself a competent classifier given the typology + examples.

References

Commit 9e76bc82 — six-pattern typology shipped
Commit b88fc086 — original CHAT_BOLD_LEDE_RULE (Pattern D), now folded in
backend/app/prompts.py — CHAT_ANSWER_SHAPE_RULES constant
backend/app/services/rag_service.py:4641 — chat-only injection site
Master ADR: docs/ADR/0056-chat-answer-shape-typology.md

Context​

Decision​

The six patterns​

Cross-pattern rules​

Rationale: typology beats rule-per-defect​

Consequences​

Positive​

Negative / trade-offs​

Rejected alternatives​

References​