Skip to main content

ADR-0027: Multilingual Prompts and LLM Best Practices

Date: 2026-02-13 | Status: Accepted

Context

ZOL Hospital serves a diverse community in Belgian Limburg with significant Turkish, Italian, Romanian, and other immigrant populations. Our prompts were:

  • Written entirely in Dutch
  • Had no language detection
  • Generated all responses in Dutch only
  • Used Dutch-language prompt engineering (suboptimal for LLM instruction following)

Modern LLMs follow English system prompts more reliably than prompts in other languages. The models still generate output in any language when instructed to.

Decision

Pattern B: Cross-Lingual RAG (Lewis et al., 2020)

  • Detect user language via intent classification (ISO 639-1 code)
  • Retrieve in Dutch (document index is Dutch)
  • Generate in the user's detected language

Prompt Language

All system prompts rewritten in English following LLM best practices:

  • # section headers for structure
  • Step-by-step reasoning instructions
  • Explicit output format specifications
  • Few-shot examples with multilingual coverage

Multilingual Blocked Messages

BLOCKED_MESSAGES dictionary with translations for 8 languages: nl, en, tr, fr, de, it, ro, el

Categories: needs_medical_advice, off_topic, other_hospital, vague_input

Multilingual Disclaimers

DISCLAIMERS dictionary (same 8 languages) appended to every response.

Citation Deduplication

  • One [N] per unique document (was: one per chunk)
  • Graph results with source_document_id get citation numbers
  • Graph results without source doc remain uncited under supplementary section

Schema Changes

  • IntentClassificationResult.detected_language: str = "nl"
  • IntentBlockedError.message_nl renamed to IntentBlockedError.message (multilingual)

Consequences

Positive

  • Turkish, Italian, Romanian, French, German, English speakers get native-language responses
  • English system prompts improve LLM instruction adherence
  • Citation dedup reduces clutter
  • Centralized blocked message and disclaimer logic

Negative

  • Translation quality depends on LLM capabilities (no human review)
  • Dutch retrieval may miss multilingual synonyms (future: multilingual embeddings)
  • Breaking change: message_nlmessage requires test/consumer updates