Skip to main content

ADR-0005: Embedding Model Selection

Superseded — twice

This ADR has been superseded twice. The current embedding model is OpenAI text-embedding-3-large (1536 dimensions, hosted) — see ADR-0048: OpenAI Embeddings Migration (2026-04-30).

The intermediate decision was ADR-0033: BGE-M3 Embedding Migration (2026-02-18, also now superseded), which moved the system from nomic-embed-text (this ADR) to BGE-M3 (1024 dim, on-prem Ollama). Voice-channel latency and Ollama's CPU serialization tax then drove the further move to the hosted OpenAI model in ADR-0048.

The body of this record is preserved verbatim as the original 2025 decision; do not configure a new system from this page. For the current embedding stack, read ADR-0048 and the storage architecture page.

Context

The RAG system requires an embedding model to convert text into dense vectors for semantic search (Reimers & Gurevych, 2019). The model must satisfy three requirements:

  1. Dutch language quality: The ZOL content and user queries are in Dutch (Flemish). The model must produce high-quality embeddings for Dutch medical terminology.
  2. Operational cost: With ~50,000 document chunks to embed and ~25,000 monthly queries, embedding costs accumulate rapidly.
  3. Data privacy: Hospital content should ideally not leave the infrastructure for processing.

Three models were evaluated over the course of development.

Evaluation Journey

Model 1: OpenAI text-embedding-3-small

The initial implementation used OpenAI's hosted embedding API:

AspectAssessment
Dimensions1,536
Dutch qualityGood
Cost$0.02 per million tokens
PrivacyAll content sent to OpenAI API
Latency~200ms per batch (network dependent)

While functionally adequate, two concerns emerged: the ongoing API cost (projected at ~$15/month for ZOL's volume) and the data privacy implication of sending hospital content to an external API.

Model 2: mxbai-embed-large (Eliminated)

To address API dependency, the team evaluated mxbai-embed-large, a high-quality open-source model:

AspectAssessment
Dimensions1,024
Dutch qualityPoor
CostZero (local)
PrivacyExcellent (local)

Testing with Dutch medical content revealed a critical failure: the model produced embeddings with poor semantic discrimination for Dutch text. Queries like "knieoperatie voorbereiding" (knee surgery preparation) and "hartfalen behandeling" (heart failure treatment) produced unacceptably similar embeddings, leading to irrelevant retrieval results.

Model 3: nomic-embed-text (Selected)

AspectAssessment
Dimensions768
Dutch qualityStrong (~100 languages)
CostZero (local via Ollama)
PrivacyExcellent (local)
Context window8,192 tokens
Inference speed~50ms per batch (local GPU)

Decision

Adopt nomic-embed-text as the embedding model, running locally via Ollama.

Consequences

Positive

  • Zero API cost: Projected savings of ~$180/year vs. OpenAI
  • Data privacy: No hospital content leaves the infrastructure
  • Strong Dutch support: ~100 languages including Dutch, validated empirically
  • 50% storage reduction: 768 vs. 1,536 dimensions
  • Faster computation: 768-dim cosine similarity is faster than 1,536-dim
  • No network dependency: Local inference is unaffected by API outages

Negative

  • Fewer dimensions: Some theoretical loss of embedding expressiveness (not observed empirically for the ZOL domain)
  • GPU requirement: Local inference requires a GPU-capable machine
  • Model updates: Must manually update the Ollama model (vs. automatic API updates)
  • Single point of failure: If Ollama crashes, embedding generation halts (mitigated by Docker auto-restart)

Dimension Comparison

Validation

The model was validated against a test set of 200 Dutch medical queries, comparing retrieval quality (Precision@5, Recall@10) between OpenAI text-embedding-3-small and nomic-embed-text:

MetricOpenAInomic-embed-textDelta
Precision@50.820.79-3.7%
Recall@100.910.88-3.3%
MRR0.850.83-2.4%

The small quality reduction (~3%) was deemed acceptable given the significant cost, privacy, and operational advantages. The system compensates for this slight quality reduction through metadata boosting and hybrid search (combining vector results with knowledge graph results).

Successor: BGE-M3 (ADR-0033)

In February 2026, nomic-embed-text was replaced by bge-m3 (1024 dimensions) after evaluation revealed that nomic-embed-text had no measured Dutch benchmark score (MTEB-NL). BGE-M3 provides:

  • MTEB-NL score of 60.0 (measured Dutch quality vs. unknown)
  • 1024 dimensions (richer semantic representations)
  • 100+ languages with XLM-RoBERTa architecture
  • Same local Ollama deployment model

See ADR-0033: BGE-M3 Embedding Migration for the full decision and migration details.