ADR-0033: BGE-M3 Embedding Migration

Superseded by ADR-0048 (2026-04-30)

This decision record describes the adoption of BGE-M3 in February 2026. The embedding model has since been migrated to OpenAI text-embedding-3-large — see ADR-0048. The body below is preserved verbatim as the historical decision record; do not configure a new system from this page. For the current embedding stack, read the embedding-model decision and the storage architecture page.

Status: Superseded by ADR-0048 (2026-04-30) — original status: Accepted (February 2026)

Supersedes: ADR-0005

Context

The ZOL Intelligent Search system relied on nomic-embed-text (768-dim) for all embedding operations. While functional, this model had three significant limitations:

Unknown Dutch quality: nomic-embed-text was not benchmarked on MTEB-NL (the Dutch embedding benchmark), making its Dutch retrieval quality unmeasured.
Semantic cache contamination: The A/B experiment (report) revealed that structurally similar Dutch medical queries (e.g., "Welke artsen werken bij Cardiologie?" vs "Welke artsen werken bij Orthopedie?") produced dangerously similar embeddings (cosine >0.97), causing cache false positives.
Limited multilingual support: English-primary training data resulted in weak cross-lingual embedding similarity for non-Dutch queries (Turkish, Arabic patient demographics).

BGE-M3 (Chen et al., 2024) was identified as the strongest candidate based on:

MTEB-NL benchmark score of 60.0 (retrieval), providing measured Dutch quality
1024 dimensions (vs 768), offering richer representations
100+ language support with XLM-RoBERTa architecture
Same 8,192-token context window as nomic-embed-text
Available on Ollama for zero-cost local inference

An alternative candidate, multilingual-e5-large-instruct (MTEB-NL: 61.4), was rejected due to its 512-token context window limitation -- insufficient for our medical content chunks averaging ~350 tokens with contextual enrichment.

Decision

Migrate from nomic-embed-text (768-dim) to BGE-M3 (1024-dim) as the embedding model for all operations: document ingestion, query embedding, quality gate evaluation, and semantic cache.

Migration Steps Executed

Updated config: embedding_model="bge-m3", embedding_dimensions=1024
Database migration: Altered pgvector column from vector(768) to vector(1024)
Re-embedded all documents with enriched text (contextual retrieval format)
Rebuilt semantic cache entries
Recalibrated similarity thresholds (quality gate: 0.40 maintained; cache: 0.97 maintained)
Ran golden evaluation to validate retrieval quality

Consequences

Positive

Measurable Dutch quality: MTEB-NL score of 60.0 replaces "unknown" baseline
Better cache discrimination: Higher-dimensional embeddings produce more distinctive vectors for structurally similar queries
Improved multilingual support: Superior cross-lingual retrieval quality
Future ColBERT option: BGE-M3 supports dense + sparse + ColBERT retrieval modes

Negative

33% more storage per vector (1024 vs 768 dimensions)
Full re-indexing required during migration (temporary downtime)
Slightly higher embedding latency (~15% increase for local inference)

References

Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). BGE M3-Embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint, arXiv:2402.03216. https://arxiv.org/abs/2402.03216
Muennighoff, N., et al. (2023). MTEB: Massive text embedding benchmark. Proceedings of EACL 2023, 2014--2037. https://huggingface.co/spaces/mteb/leaderboard

Context​

Decision​

Migration Steps Executed​

Consequences​

Positive​

Negative​

References​