Skip to main content

ADR-0056: Chat Answer-Shape Typology — Six Patterns, Not Sixty Rules

Date: 2026-05-12 | Status: Accepted (shipped in commit 9e76bc82) | Related: ADR-0055 FAQ-Corpus Drift Prevention, the CHAT_BOLD_LEDE_RULE shipped earlier 2026-05-12 in commit b88fc086

Context

On 2026-05-12 the competitor product at zolcase.novation.website/slim-zoeken demonstrated significantly more polished, more scannable, more actionable answers than ours for "Wat is een gastroscopie?":

  • Competitor: 2 short intro paragraphs + bulleted attribute list with bold labels (Duur, Voorbereiding, Verloop, Sedatie). Inline [N] markers on every claim. 12 citations across 3 source documents.
  • Ours: 1 200+ character wall of text. No bullets, no labels, no headers. Citations sparse or absent.

Earlier the same day we shipped CHAT_BOLD_LEDE_RULE (commit b88fc086) — a single-pattern rule for multi-entity answers (Pattern D below). That fixed one shape. The gastroscopy comparison made clear we'd just be playing whack-a-mole if we kept adding one prompt rule per visible defect.

The competitor's prompt is clearly not a list of per-topic rules — it's a typology: a small set of universal shape patterns, and the LLM picks the right one per question. Our prompt was list-based (INSTITUTIONAL / POINT-FACT / PROCEDURAL) and explicitly suppressed structure via "Do NOT use markdown headers."

Decision

Replace CHAT_BOLD_LEDE_RULE with CHAT_ANSWER_SHAPE_RULES — a six-pattern typology with worked examples. Same chat-only injection at rag_service.py:4641 (voice path unchanged). Backwards-compat alias kept for older imports.

The six patterns

PatternWhenShape
A — POINT-FACTSingle discrete answer (phone, address, hour)1-2 sentences + 1 citation
B — STEP-BY-STEP"How do I X?" proceduralNumbered list, citation per step
C — ATTRIBUTE-LISTSingle topic with 3+ distinct attributes (procedures, conditions, doctor profiles, cost breakdowns, schedules)1-2 intro paragraphs + bulleted attribute list with bold labels
D — MULTI-ENTITYQuestion covers multiple distinct entitiesOne bold-lede paragraph per entity (formerly CHAT_BOLD_LEDE_RULE)
E — COMPARISON"X vs Y" / "verschil tussen…"Brief intro + two parallel bold-labeled sections, each with bullets
F — DECISION-TREE"Wanneer moet ik X?" / triageConditional bullets: Bij ernstige…: action / Bij milde…: alternative

Cross-pattern rules

  • Bullets are not markdown headers; the # headers banned rule does NOT block bulleted lists.
  • Inline [N] markers go IMMEDIATELY after each substantive claim. Per-bullet for Patterns C/D/E/F; per-sentence for A/B.
  • Bold only the LABEL (**Duur:**), never the whole bullet body.
  • Do NOT blend patterns — pick the shape that matches the user's primary question.
  • This shape-selection rule applies to the chat surface only. Voice answers stay as natural prose.

Rationale: typology beats rule-per-defect

Whack-a-mole cost. Adding one rule per visible problem produces a prompt that grows linearly with defect history. Token cost rises, rule interactions multiply, the LLM's compliance degrades on rules that aren't recently hot.

Typology cost. A typology adds upfront text but compresses future maintenance. The next visible problem ("cost breakdowns look bad", "schedule answers look bad") falls under Pattern C without new code — the LLM already has the template + a worked example.

Generalization budget. Six patterns cover ~95% of patient questions observed in app.conversation_messages over the last 90 days:

  • Definitions, conditions, doctor profiles, cost, schedule, conditional → Pattern C
  • "Welke onderzoeken / artsen / afdelingen…" → Pattern D
  • "Wat is het verschil tussen X en Y?" → Pattern E
  • "Wanneer moet ik naar spoed?" → Pattern F
  • "Hoe schrijf ik me in?" → Pattern B
  • "Wat is het telefoonnummer / adres / spreekuur?" → Pattern A

The remaining ~5% (refusal, off-topic redirect, safety-policy crisis) is governed by separate prompt sections.

Consequences

Positive

  • One framework, many problems solved. Future "the X answer looks bad" complaints become "the LLM picked the wrong pattern" — single point of change.
  • Better citation density per claim. Patterns C and D's "citation per bullet" rule forces the LLM to attribute each fact.
  • Patient scannability. Bold labels let patients skim for what matters (Duur, Sedatie, Voorbereiding).
  • Voice path unaffected. Rule explicitly chat-only.

Negative / trade-offs

  • Prompt-token growth. CHAT_ANSWER_SHAPE_RULES is ~70 lines longer than the prior CHAT_BOLD_LEDE_RULE. ~500 extra tokens per chat query. At GPT-4.1-mini pricing that's ~$0.0005 per query — negligible at pilot volumes.
  • Risk: LLM applies wrong pattern. Mitigation: each pattern has explicit "use when" + "do NOT use when" guidance.
  • Risk: bullet overuse. Patterns A and B explicitly say "no bullets"; Pattern C requires 3+ distinct attributes.

Rejected alternatives

  1. Add a procedure-explanation rule + a comparison rule + a cost rule incrementally. Rejected: whack-a-mole.
  2. Remove Do NOT use markdown headers from the universal rules. Rejected: doesn't tell the LLM what to do INSTEAD.
  3. Ship a separate prompt-bullets-allowed feature flag. Rejected: the issue isn't whether bullets are allowed — it's that the LLM didn't know when to use them.
  4. Train a classifier to pick the pattern per question. Considered for post-MVP. Rejected for now: GPT-4.1 is itself a competent classifier given the typology + examples.

References

  • Commit 9e76bc82 — six-pattern typology shipped
  • Commit b88fc086 — original CHAT_BOLD_LEDE_RULE (Pattern D), now folded in
  • backend/app/prompts.pyCHAT_ANSWER_SHAPE_RULES constant
  • backend/app/services/rag_service.py:4641 — chat-only injection site
  • Master ADR: docs/ADR/0056-chat-answer-shape-typology.md