ADR-0005: Embedding Model Selection

Superseded — twice

This ADR has been superseded twice. The current embedding model is OpenAI text-embedding-3-large (1536 dimensions, hosted) — see ADR-0048: OpenAI Embeddings Migration (2026-04-30).

The intermediate decision was ADR-0033: BGE-M3 Embedding Migration (2026-02-18, also now superseded), which moved the system from nomic-embed-text (this ADR) to BGE-M3 (1024 dim, on-prem Ollama). Voice-channel latency and Ollama's CPU serialization tax then drove the further move to the hosted OpenAI model in ADR-0048.

The body of this record is preserved verbatim as the original 2025 decision; do not configure a new system from this page. For the current embedding stack, read ADR-0048 and the storage architecture page.

Context

The RAG system requires an embedding model to convert text into dense vectors for semantic search (Reimers & Gurevych, 2019). The model must satisfy three requirements:

Dutch language quality: The ZOL content and user queries are in Dutch (Flemish). The model must produce high-quality embeddings for Dutch medical terminology.
Operational cost: With ~50,000 document chunks to embed and ~25,000 monthly queries, embedding costs accumulate rapidly.
Data privacy: Hospital content should ideally not leave the infrastructure for processing.

Three models were evaluated over the course of development.

Evaluation Journey

Model 1: OpenAI text-embedding-3-small

The initial implementation used OpenAI's hosted embedding API:

Aspect	Assessment
Dimensions	1,536
Dutch quality	Good
Cost	$0.02 per million tokens
Privacy	All content sent to OpenAI API
Latency	~200ms per batch (network dependent)

While functionally adequate, two concerns emerged: the ongoing API cost (projected at ~$15/month for ZOL's volume) and the data privacy implication of sending hospital content to an external API.

Model 2: mxbai-embed-large (Eliminated)

To address API dependency, the team evaluated mxbai-embed-large, a high-quality open-source model:

Aspect	Assessment
Dimensions	1,024
Dutch quality	Poor
Cost	Zero (local)
Privacy	Excellent (local)

Testing with Dutch medical content revealed a critical failure: the model produced embeddings with poor semantic discrimination for Dutch text. Queries like "knieoperatie voorbereiding" (knee surgery preparation) and "hartfalen behandeling" (heart failure treatment) produced unacceptably similar embeddings, leading to irrelevant retrieval results.

Model 3: nomic-embed-text (Selected)

Aspect	Assessment
Dimensions	768
Dutch quality	Strong (~100 languages)
Cost	Zero (local via Ollama)
Privacy	Excellent (local)
Context window	8,192 tokens
Inference speed	~50ms per batch (local GPU)

Decision

Adopt nomic-embed-text as the embedding model, running locally via Ollama.

Consequences

Positive

Zero API cost: Projected savings of ~$180/year vs. OpenAI
Data privacy: No hospital content leaves the infrastructure
Strong Dutch support: ~100 languages including Dutch, validated empirically
50% storage reduction: 768 vs. 1,536 dimensions
Faster computation: 768-dim cosine similarity is faster than 1,536-dim
No network dependency: Local inference is unaffected by API outages

Negative

Fewer dimensions: Some theoretical loss of embedding expressiveness (not observed empirically for the ZOL domain)
GPU requirement: Local inference requires a GPU-capable machine
Model updates: Must manually update the Ollama model (vs. automatic API updates)
Single point of failure: If Ollama crashes, embedding generation halts (mitigated by Docker auto-restart)

Dimension Comparison

Validation

The model was validated against a test set of 200 Dutch medical queries, comparing retrieval quality (Precision@5, Recall@10) between OpenAI text-embedding-3-small and nomic-embed-text:

Metric	OpenAI	nomic-embed-text	Delta
Precision@5	0.82	0.79	-3.7%
Recall@10	0.91	0.88	-3.3%
MRR	0.85	0.83	-2.4%

The small quality reduction (~3%) was deemed acceptable given the significant cost, privacy, and operational advantages. The system compensates for this slight quality reduction through metadata boosting and hybrid search (combining vector results with knowledge graph results).

Successor: BGE-M3 (ADR-0033)

In February 2026, nomic-embed-text was replaced by bge-m3 (1024 dimensions) after evaluation revealed that nomic-embed-text had no measured Dutch benchmark score (MTEB-NL). BGE-M3 provides:

MTEB-NL score of 60.0 (measured Dutch quality vs. unknown)
1024 dimensions (richer semantic representations)
100+ languages with XLM-RoBERTa architecture
Same local Ollama deployment model

See ADR-0033: BGE-M3 Embedding Migration for the full decision and migration details.

Context​

Evaluation Journey​

Model 1: OpenAI text-embedding-3-small​

Model 2: mxbai-embed-large (Eliminated)​

Model 3: nomic-embed-text (Selected)​

Decision​

Consequences​

Positive​

Negative​

Dimension Comparison​

Validation​

Successor: BGE-M3 (ADR-0033)​