ADR-0033: BGE-M3 Embedding Migration
This decision record describes the adoption of BGE-M3 in February 2026. The embedding model has since been migrated to OpenAI text-embedding-3-large — see ADR-0048. The body below is preserved verbatim as the historical decision record; do not configure a new system from this page. For the current embedding stack, read the embedding-model decision and the storage architecture page.
Status: Superseded by ADR-0048 (2026-04-30) — original status: Accepted (February 2026)
Supersedes: ADR-0005
Context
The ZOL Intelligent Search system relied on nomic-embed-text (768-dim) for all embedding operations. While functional, this model had three significant limitations:
- Unknown Dutch quality: nomic-embed-text was not benchmarked on MTEB-NL (the Dutch embedding benchmark), making its Dutch retrieval quality unmeasured.
- Semantic cache contamination: The A/B experiment (report) revealed that structurally similar Dutch medical queries (e.g., "Welke artsen werken bij Cardiologie?" vs "Welke artsen werken bij Orthopedie?") produced dangerously similar embeddings (cosine >0.97), causing cache false positives.
- Limited multilingual support: English-primary training data resulted in weak cross-lingual embedding similarity for non-Dutch queries (Turkish, Arabic patient demographics).
BGE-M3 (Chen et al., 2024) was identified as the strongest candidate based on:
- MTEB-NL benchmark score of 60.0 (retrieval), providing measured Dutch quality
- 1024 dimensions (vs 768), offering richer representations
- 100+ language support with XLM-RoBERTa architecture
- Same 8,192-token context window as nomic-embed-text
- Available on Ollama for zero-cost local inference
An alternative candidate, multilingual-e5-large-instruct (MTEB-NL: 61.4), was rejected due to its 512-token context window limitation -- insufficient for our medical content chunks averaging ~350 tokens with contextual enrichment.
Decision
Migrate from nomic-embed-text (768-dim) to BGE-M3 (1024-dim) as the embedding model for all operations: document ingestion, query embedding, quality gate evaluation, and semantic cache.
Migration Steps Executed
- Updated config:
embedding_model="bge-m3",embedding_dimensions=1024 - Database migration: Altered pgvector column from
vector(768)tovector(1024) - Re-embedded all documents with enriched text (contextual retrieval format)
- Rebuilt semantic cache entries
- Recalibrated similarity thresholds (quality gate: 0.40 maintained; cache: 0.97 maintained)
- Ran golden evaluation to validate retrieval quality
Consequences
Positive
- Measurable Dutch quality: MTEB-NL score of 60.0 replaces "unknown" baseline
- Better cache discrimination: Higher-dimensional embeddings produce more distinctive vectors for structurally similar queries
- Improved multilingual support: Superior cross-lingual retrieval quality
- Future ColBERT option: BGE-M3 supports dense + sparse + ColBERT retrieval modes
Negative
- 33% more storage per vector (1024 vs 768 dimensions)
- Full re-indexing required during migration (temporary downtime)
- Slightly higher embedding latency (~15% increase for local inference)
References
- Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). BGE M3-Embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint, arXiv:2402.03216. https://arxiv.org/abs/2402.03216
- Muennighoff, N., et al. (2023). MTEB: Massive text embedding benchmark. Proceedings of EACL 2023, 2014--2037. https://huggingface.co/spaces/mteb/leaderboard