Build vs. Buy: Why a Custom RAG System

A one-page decision brief for two readers: the business sponsor asking "couldn't we have just bought this?" and the technical evaluator asking "where, specifically, do the managed products fall short?" The exec verdict is first; the capability matrix and the structural argument follow.

Executive verdict

Off-the-shelf enterprise-search and healthcare-chatbot products solve a different problem than ZOL's: they optimise for breadth (any corpus, any tenant, English-first) while ZOL needs depth in one narrow, safety-critical domain — Dutch-language hospital navigation where a wrong answer is a patient-safety and regulatory event, not a bad search result. The three requirements that forced a custom build — strict citation-grounded refusal of medical advice, Dutch medical terminology resolution via SNOMED CT, and EU AI Act / GDPR auditability designed in rather than bolted on — are precisely the ones no managed product offers as a configurable feature. The custom system is not more code for its own sake: it is the minimum surface needed to make those three properties structural instead of hoped-for, at a measured 99.0 % pass rate on a 302-question regulated-domain benchmark (thesis Ch. 4, Table 4.1).

Scope of this brief

"Off-the-shelf" here means the realistic buy alternatives a hospital integrator would shortlist: managed enterprise-search (Google Vertex AI Search, AWS Kendra), healthcare chatbot platforms (Azure Health Bot), managed RAG-on-FHIR templates (AWS HealthLake + Bedrock), and LLM-wrapper chatbots. It does not mean "use no third-party components" — ZOL itself buys embeddings (OpenAI), STT (Deepgram), and TTS (ElevenLabs). The build-vs-buy line is drawn at the cognition and safety layer, not the commodity infrastructure beneath it.

The decision matrix

Eight capabilities that the ZOL use case requires, scored against the buy options. Cells follow the project's cite-or-blank discipline: a claim is backed by a source or marked not documented. ✅ = first-class / configurable; ⚠️ = partial / requires custom work on top; ❌ = not offered.

#	Capability ZOL requires	Vertex AI Search	Azure Health Bot	AWS HealthLake + Bedrock	LLM-wrapper chatbot	ZOL custom
1	No-medical-advice refusal (multi-layer safety; zero-advice target)	❌ general relevance, no medical-advice gate	⚠️ Healthcare Safeguards, English-tuned	❌ retrieval only	❌	✅ triple-defense, 7-layer safety
2	Citation-grounded answers (every claim traces to a source)	⚠️ snippets, not enforced grounding	⚠️	⚠️ Bedrock KB citations, generic	❌	✅ citation pipeline, grounding-or-refuse
3	Dutch medical terminology (patient colloquial → clinical)	⚠️ general multilingual, no medical lexicon	⚠️	❌	❌	✅ SNOMED CT NL-BE
4	Structured hospital knowledge (doctor↔dept↔campus↔condition)	❌ flat documents	❌	⚠️ FHIR, not website taxonomy	❌	✅ taxonomy graph
5	Intent×category contamination control	❌	❌	❌	❌	✅ Value Framework
6	EU AI Act + GDPR auditability (Art. 50 / Art. 35 DPIA)	⚠️ platform certs, not app-level	⚠️	⚠️	❌	✅ DPIA + AI Act on file
7	Per-tenant onboarding, zero code	⚠️ per-project config	⚠️	❌ integration project	❌	✅ multi-tenant overlays
8	Operator-curated quality gate (draft→approve→publish)	❌	❌	❌	❌	✅ draft/publish

Reading the matrix: the buy options are strong on capabilities that are commodity (multilingual snippets, platform certifications, horizontal scale) and weak-to-absent on the capabilities that are load-bearing for a hospital (rows 1, 3, 5, 8 are ❌ across every managed option). No single product clears more than a partial score on the four rows that matter most for patient safety and Dutch-language quality.

Five reasons off-the-shelf structurally cannot fit

These are not gaps a vendor will close in the next release — they follow from the products' design centre.

Safety is a pipeline stage, not a relevance knob. Enterprise search ranks documents; it has no concept of "this query is asking for medical advice — refuse and disclaim." ZOL's triple-defense (regex pre-filter → LLM classifier → safety post-filter) is an architecture, not a setting. A buy product would need this bolted on outside it — at which point you are building the hard part anyway.
Grounding must be able to say "I don't know." Managed RAG returns the best-matching passage even when nothing matches well. A hospital system must refuse when the corpus doesn't ground the answer, because a confident wrong answer is the failure mode that the zero-medical-advice-incidents target forbids. Grounding-or-refuse is a control-flow property of the generation step, not a re-ranking parameter.
Dutch clinical terminology is not "multilingual support." Vertex/Azure multilingual means the UI and embeddings work in many languages. It does not mean "suikerziekte" resolves to Diabetes Mellitus and routes to Endocrinologie. That requires SNOMED CT NL-BE anchoring (touchpoints a/b/c) — a medical-ontology layer no general search product ships.
The durable asset is curated data, not the model. The competitive advantage is the operator-reviewed, SNOMED-mapped taxonomy — hundreds of validated relationships that improve with each hospital. A buy product gives you someone else's generic model and none of this data; you would still have to build the extraction → dedup → review → publish lifecycle that produces it.
EU AI Act compliance is structural, not a checkbox. Healthcare AI is high-scrutiny under Regulation 2024/1689; the system needs per-decision audit trails, human-in-the-loop approval of navigational relationships, and source explainability by construction. Platform-level certifications (a vendor's SOC 2, its data-residency promise) do not satisfy application-level obligations — see AI Act compliance and the DPIA. Retrofitting auditability onto a closed managed product is the expensive path the custom design avoids.

Cost & risk

The build-vs-buy gap is not primarily about licence cost — at pilot scale the running cost is small either way (the cognition stack runs at roughly $8.70/month in LLM+embedding spend per the competitive analysis; self-hosting the SIP gateway saves an estimated $375–625/month vs. managed LiveKit Cloud, per the architecture one-pager). The decisive axis is risk and fit:

Dimension	Buy (managed/off-the-shelf)	Build (ZOL custom)
Patient-safety risk	High — no native medical-advice refusal; confident-wrong answers possible	Controlled — refusal + disclaimer are pipeline stages
Regulatory fit (EU AI Act / GDPR)	App-level obligations unmet by platform certs; retrofit cost later	DPIA + AI Act classification on file; audit trail by design
Dutch medical quality	Generic multilingual; terminology gap unaddressed	SNOMED-anchored; measured 99.0 % on 302-Q benchmark
Vendor lock-in	High — proprietary index, model, and data format	Low — Postgres + pgvector + open formats; portable
Time-to-second-hospital	Integration project each time	Overlay onboarding, zero source change
Differentiation	Same product every competitor can buy	The taxonomy moat compounds per hospital

The honest counter-argument: a buy product is faster to a first demo and removes the operational burden of running a stack. That is real — but it trades away exactly the four properties (rows 1, 2, 3, 6 of the matrix) that make the system safe and compliant for a hospital. For a breadth problem (search any intranet), buy. For this depth-and-safety problem, the custom build is the lower-risk choice, not the indulgent one.

Where to read more

Competitive Advantage & Business Case — the taxonomy moat and ROI pitch in full.
Competitive Analysis — vendor-by-vendor comparison incl. the Belgian-market survey.
SOTA Positioning Matrix — 18 vendors × 8 axes, cite-or-blank.
Core Concepts overview — the retrieval-steering subsystems that constitute the "custom" part.
Safety overview · AI Act compliance · DPIA — the structural-compliance argument behind reason 5.

Executive verdict​

The decision matrix​

Five reasons off-the-shelf structurally cannot fit​

Cost & risk​

Where to read more​

Executive verdict

The decision matrix

Five reasons off-the-shelf structurally cannot fit

Cost & risk

Where to read more