Query Rewriting
Query rewriting is the first transformation applied to a user's question, performed inside the intent-classification LLM call (see ADR-0030) before any retrieval runs. It produces the rewritten_query field that the rest of the pipeline operates on instead of the raw user input.
It is the foundation that several downstream stages depend on — most importantly the semantic query cache, which keys on the rewritten query rather than the raw input. Understanding rewriting first makes the cache, enrichment, and decomposition pages read in their natural dependency order.
This page uses query rewriting as the umbrella term for the whole transformation. It has two sub-operations, used precisely throughout the docs:
- Canonical reformulation — normalizing language, spelling, and sentence structure into a fixed Dutch clinical form (applies to every query).
- Follow-up resolution — resolving pronouns and implicit references against conversation history (applies only to follow-up turns).
The code field is rewritten_query; the prompt header is "Query reformulation". Treat rewriting and reformulation as the same concept at different granularity.
Why rewrite at all?
Raw user input is a poor retrieval and caching key. It is short, colloquial, frequently misspelled, and arrives in any of eight supported languages, while the indexed corpus is written in proper clinical Dutch. Rewriting every query into one canonical form serves three distinct purposes:
- Retrieval quality. A well-formed Dutch clinical sentence lands in the same linguistic space as the documents, so cosine similarity actually aligns. This is the classic query–document vocabulary mismatch problem, solved on the query side.
- Cross-language normalization. A Turkish, English, and Dutch query that mean the same thing collapse to the same rewritten string — one canonical form, regardless of input language.
- Cache effectiveness. Because semantically identical questions produce identical (or near-identical) rewritten strings, they collide on the same cache key. This is what lifts the semantic cache hit rate from
<5%(raw-input keying) to an estimated 40–60%. See § Relationship to the semantic cache.
| Original (multilingual) | Rewritten (canonical Dutch clinical) |
|---|---|
| "rugpijn" | "Welke afdelingen bij ZOL behandelen Dorsalgie?" |
| "back pain" | "Welke afdelingen bij ZOL behandelen Dorsalgie?" |
| "sırt ağrısı" (Turkish) | "Welke afdelingen bij ZOL behandelen Dorsalgie?" |
Why the corpus language? (and what a non-Dutch hospital changes)
A natural question — and one we need to be able to answer crisply — is why normalize to Dutch at all, rather than letting the multilingual embedding model handle cross-language retrieval? The short answer: the decisive reason is the symbolic layer, not the vector layer. See ADR-0061 for the full decision record.
The pipeline has two layers that consume a query, and they have very different language needs:
| Layer | What it is | Language behaviour |
|---|---|---|
| Vector layer | embeddings → cosine retrieval | The embedding model is multilingual, so "syphilis" (fr) and "syfilis" (nl) already land near each other. Cross-lingual retrieval half-works without any rewriting. |
| Symbolic layer | taxonomy CONDITION_TO_DEPT_MAP, condition/treatment aliases, SNOMED lookups, keyword-rescue safe_contains, graph entity resolution | Exact / fuzzy string matching against Dutch surface forms. A French string "syphilis" will never match the Dutch dict key "syfilis" — a literal lookup does not care how close the vectors are. |
So the reasons to rewrite to the corpus language, in priority order:
- Symbolic resolution (decisive). The Dutch-authored taxonomy, alias maps, and graph only match Dutch surface forms. Normalization is what makes them resolve at all. No amount of embedding closeness substitutes for it.
- Author the knowledge base once (decisive for scale). One canonical language means the taxonomy, aliases, prompts, and safety heuristics are written a single time, not once per supported language.
- Single-language downstream (operational). The semantic cache key, the reranker, and the Dutch-tuned answer-shaping / number-normalization / safety heuristics all assume one canonical language.
- Tighter vectors (secondary bonus). Multilingual embedding spaces cluster somewhat by language, so a same-language query↔document pair scores higher cosine than an equal-meaning cross-language pair. Rewriting to the corpus language tightens those scores — which matters for the threshold-based gates (the retrieval-confidence abstain floor, rerank cutoffs, keyword-rescue scoring), not for whether retrieval succeeds at all.
"Closer vectors" is a real but secondary benefit. We rewrite to the corpus language primarily so the deterministic knowledge layer resolves at all, and so we author that knowledge once. Tighter vectors are a welcome bonus that sharpens ranking thresholds.
The alternative we rejected: per-language enumeration
The other way to make a French query resolve in the symbolic layer is to enumerate every language's spelling in the knowledge base — add syphilis / sifilide / gonorrhée / … to the taxonomy and aliases, plus per-language regex tables elsewhere. This scales as conditions × languages and never ends. We explicitly reject it. The rewrite step collapses all input languages to one canonical form before the symbolic layer ever runs, so the knowledge base only ever needs the corpus language.
What changes for a French / English / Romanian hospital?
Almost nothing structural — because the design has one canonical language per tenant: the corpus language. Onboarding a non-Dutch hospital changes which language is canonical, not how the machinery works:
| Layer | ZOL (today) | A French / English / Romanian tenant |
|---|---|---|
| Rewrite target | Dutch (hardcoded in the prompt) | the tenant's canonical_language (fr / en / ro) |
| Taxonomy & condition→dept maps | authored in Dutch | authored in the corpus language (already per-tenant under tenant_overlays/) |
| Symbolic resolution | against the Dutch taxonomy | against that tenant's corpus-language taxonomy |
| Cross-language reach | multilingual embeddings + rewrite | unchanged — same mechanism |
The intent-classification prompt is already built per-tenant (build_intent_and_rewrite_prompt(ctx)), so the hook is half-present; the planned change is to replace the hardcoded "Dutch" with the tenant's canonical_language. We never introduce per-language enumeration — only the single normalization target moves.
A French voice caller said "je crois que j'ai une syphilis". The assistant recommended a urologist — ungrounded and clinically wrong (syphilis is dermatology-venereology / infectious diseases). Root cause: a department-grounding check resolved the condition against the caller's raw French utterance, and the Dutch taxonomy key is "syfilis" — so the French "syphilis" matched nothing, the system had no grounded department, and the LLM filled the gap.
The wrong fix was to add "syphilis" (and every other language's spelling) to the taxonomy. The right fix was to resolve against the rewritten_query the classifier had already normalized to Dutch ("syfilis") — the exact mechanism this page describes. One canonical lookup, zero per-language tables. See ADR-0061.
Not to be confused with the language lock
Rewriting-to-corpus-language decides what language we retrieve and resolve in (always the corpus language). The language lock decides what language we answer in (the caller's, pinned per conversation). A French call is answered in French while its knowledge-base lookups run in Dutch. The two compose cleanly.
Canonical reformulation: templates
To make reformulations deterministic and consistent — rather than letting the LLM freely vary vocabulary and sentence shape — the intent-classification prompt supplies canonical templates, one fixed sentence pattern per intent type. The templates are tenant-parameterized ({hospital_name} is substituted per tenant; the examples below show the ZOL instance):
| Intent | Canonical template |
|---|---|
doctor_lookup + department | "Welke artsen werken bij de afdeling {Department} van ZOL?" |
doctor_lookup + name | "Wie is Dr. {Name} en op welke afdeling werkt Dr. {Name} bij ZOL?" |
department_lookup | "Wat doet de afdeling {Department} bij ZOL en welke zorg biedt deze aan?" |
condition_info | "Wat is {Condition} en welke behandelingen biedt ZOL aan?" |
treatment_info / examination | "Hoe verloopt {Treatment/Examination} bij ZOL?" |
booking_contact + department | "Hoe maak ik een afspraak bij de afdeling {Department} van ZOL?" |
symptom_description | "Welke afdelingen bij ZOL behandelen {symptom}?" |
navigation / practical | "Waar kan ik {topic} vinden bij ZOL?" |
Source of truth: INTENT_AND_REWRITE_PROMPT in backend/app/prompts.py.
The templates enforce two structural rules that keep reformulations collision-friendly:
- Always
"ZOL"(never the full "Ziekenhuis Oost-Limburg") — consistent naming. - Always
"afdeling X"(never "X-ische artsen") — consistent structure.
Alongside the templates, the same prompt step performs entity normalization — mapping colloquial terms to canonical Dutch clinical names (e.g. "rugpijn" → "Dorsalgie", "hartfilmpje" → "ECG") and translating Latin/scientific terms to their common Dutch equivalent, since the hospital corpus is written in Dutch. This overlaps with, but is distinct from, the taxonomy-driven query enrichment that runs after classification.
Without templates the LLM varies freely. "who are the orthopedic doctors?" was once reformulated as "Wie zijn de orthopedische artsen bij Ziekenhuis Oost-Limburg?" instead of the canonical "Welke artsen werken bij de afdeling Orthopedie van ZOL?" — only 0.746 cosine similarity to the Dutch canonical form. That single variance defeated the semantic cache (no hit) and degraded retrieval (4 doctors returned vs. 12). Canonical templates eliminate this variance.
Follow-up resolution
When conversation history exists, intent classification and rewriting run as a single combined LLM call (classify_and_rewrite()), so follow-up resolution adds no extra latency. Follow-up turns often contain anaphoric references that are meaningless in isolation; rewriting resolves them into a self-contained query:
- Turn 1: "Welke dokters zijn er bij orthopedie?" (Which doctors are in orthopedics?)
- Turn 2: "Wanneer heeft hij consultatie?" (When does he have consultation?)
- Rewritten: "Wanneer heeft de orthopedisch arts consultatie bij ZOL?"
The operational details of follow-up handling — the citation-enriched history format passed to the prompt, and the under-6-word fallback heuristic when LLM classification fails — live with the pipeline stages they belong to, in Query Pipeline → Stage 3.
Relationship to the semantic cache
This is the dependency that ties the two concepts together, and the single most important reason rewriting exists in the shape it does:
The semantic query cache (ADR-0031) keys on the rewritten query — never the raw input. Both cache tiers depend on it: Tier 1 hashes
rewritten_query + language(so "rugpijn" / "back pain" / "sırt ağrısı" produce the same hash and share one cached answer), and Tier 2 embeds the rewritten query for cosine-similarity matching of near-paraphrases. Keying on raw input was the rejected design — it yields a<5%hit rate because multilingual, colloquial inputs produce distant embeddings that never collide.
One subtlety: rewriting (and therefore intent classification) runs on every request, even cache hits — you must reformulate to compute the lookup key. The cache saves the expensive tail (retrieval + generation + safety, ~3–5s), not the ~200–400ms classification head. A complementary safety coupling: the cache refuses to store refusals, so a transient rewriting/retrieval failure that produces "ik kan geen informatie…" is never cached and re-served.
Where rewriting sits in the pipeline
Query rewriting is the first of three query transformations, each with its own page:
- Query rewriting (this page) — canonical reformulation + follow-up resolution, inside the intent-classification call.
- Query enrichment — taxonomy/SNOMED synonym expansion appended to the rewritten query.
- Query decomposition — splitting complex multi-hop questions into parallel sub-queries.
The rewritten query (plus the entities extracted in the same call) is what flows into metadata filtering and hybrid retrieval.
See also
- ADR-0030 — LLM Entity Extraction — the decision to emit
{intent, rewritten_query, entities}from one LLM call. - ADR-0031 — Semantic Query Cache — the cache that keys on the rewritten query.
- Query Pipeline → Stage 3 — operational stage detail and follow-up heuristics.
- What is RAG? — where rewriting fits in the end-to-end retrieval-augmented flow.