Deployment Overview
The ZOL RAG system deploys as two units on a single Linux server: infrastructure (6 containerized services) and application (one image rebuilt per release). Embedding inference is performed against the OpenAI hosted API (text-embedding-3-large, 1536 dimensions) — there is no on-premise embedding container in production.
Checklist
- Read this page to understand the architecture
- Verify you have the prerequisites
- Follow the guides in order: Server Setup → Infrastructure → Application → Data Seeding
Architecture
Internet
│
▼
┌──────────────────────────────────┐
│ nginx :80/:443 │ ← Static files (React frontend)
│ uvicorn :8000 (x4 workers) │ ← FastAPI backend + reranker model
│ supervisord │ ← Process manager
└──────────┬───────────────────────┘
│ zol-network (Docker bridge)
┌────────┼────────────────────────────────────┐
│ ▼ │
│ ┌──────────┐ ┌───────┐ ┌────────────────┐ │
│ │PostgreSQL│ │ Redis │ │ Keycloak │ │
│ │ :5432 │ │ :6379 │ │ :8080 │ │
│ │ pgvector │ │ cache │ │ OIDC IdP │ │
│ └──────────┘ └───────┘ └────────────────┘ │
│ ┌──────────┐ ┌────────────┐ ┌─────────┐ │
│ │ MinIO │ │ Prometheus │ │ Grafana │ │
│ │ :9000 │ │ :9090 │ │ :3000 │ │
│ │ S3 docs │ └────────────┘ └─────────┘ │
│ └──────────┘ │
└─────────────────────────────────────────────┘
│ │
▼ ▼
OpenAI / OpenRouter API OpenAI Embeddings API
(LLM generation) text-embedding-3-large
(1536 dim, ADR-0048)
Only ports 80 and 443 are exposed to the internet. All infrastructure ports (including Keycloak at 8080 and Grafana at 3000) are bound to 127.0.0.1 and accessible only via SSH tunnel.
Three-File Compose Strategy
The deployment uses a layered Docker Compose pattern:
| File | What's Inside | When It Changes |
|---|---|---|
docker/docker-compose.infra.yml | PostgreSQL, Redis, MinIO, Keycloak, Prometheus, Grafana | Rarely (version bumps) |
docker/docker-compose.app.yml | FastAPI + React + nginx + reranker model | Every code release |
docker/docker-compose.ssl.yml | Nginx SSL overlay (port 443, certificates) | Certificate renewal |
During a normal release, only the application image is rebuilt and restarted. Infrastructure services persist data on Docker volumes.
Deploy Command
cd /opt/zol-rag && docker compose --env-file .env.prod \
-f docker/docker-compose.infra.yml \
-f docker/docker-compose.app.yml \
-f docker/docker-compose.ssl.yml up -d
Prerequisites
Hardware
| Resource | Minimum | Recommended | Why |
|---|---|---|---|
| CPU | 4 cores | 8 cores | Embedding inference + uvicorn workers |
| RAM | 16 GB | 32 GB | PostgreSQL + Keycloak + app are memory-hungry |
| Disk | 100 GB SSD | 250 GB NVMe | pgvector indexes, MinIO docs |
| Network | 100 Mbps | 1 Gbps | LLM API latency is the bottleneck |
| GPU | Not required | NVIDIA T4 | Speeds up embedding from 300ms to 50ms |
| OS | Ubuntu 22.04 / Debian 12 | Ubuntu 24.04 | Docker support required |
What You Need Before Starting
- SSH access to the server (root or sudo)
- A domain name pointing to the server IP (e.g.,
search.zol.be) - OpenRouter API key (from https://openrouter.ai)
- The git repository URL
- SNOMED CT Belgian Edition RF2 package (for terminology features)
External Dependencies
| Service | Purpose | Cost Estimate |
|---|---|---|
| OpenAI APIs (direct) | Generation (gpt-4.1-mini / gpt-4.1) + embeddings (text-embedding-3-large) | ~$0.01–0.05 per query (LLM dominant); embeddings ≈ $0.16/year at 25 K queries/mo (negligible) |
| OpenRouter (deprecated, optional override) | Legacy LLM fallback path; flag retained per ADR-0048 | Pay-as-you-go |
| Jina API (optional) | Reranker fallback (local reranker is default) | Free tier available |
Budget approximately $50-100/month for pilot traffic (~25,000 queries/month).
Memory Budget
| Service | Reserved | Max Limit | Notes |
|---|---|---|---|
| PostgreSQL | 2 GB | 4 GB | Embeddings, vector search, taxonomy, Keycloak DB |
| Keycloak | 512 MB | 1 GB | OIDC identity provider, realm management |
| Redis | 512 MB | 1 GB | Cache (bounded by LRU policy) |
| MinIO | 256 MB | 512 MB | Document storage |
| App (nginx + uvicorn) | 3 GB | 6 GB | 4 workers + reranker model (~400 MB) |
| Prometheus + Grafana | 384 MB | 768 MB | Metrics (30-day retention) |
| OS + Docker overhead | 2 GB | 4 GB | Kernel, daemon, logging |
| Total | ~9 GB | ~18 GB | 16 GB min, 32 GB comfortable |
Deployment Order
Follow these guides in sequence:
- Server Setup — Docker, clone, secrets, firewall
- Infrastructure — Start 6 services, verify health
- Application — Build image, migrations, start app
- Data Seeding — SNOMED, crawl, taxonomy extraction
- SSL & DNS — TLS certificates, domain config
- User Management — Keycloak users, roles
- Monitoring — Grafana dashboards, health checks
For ongoing operations:
- Updates — Code releases, rollback
- Troubleshooting — Common issues, debug commands
- Scripts Reference — All deployment scripts
Architectural Evolution
The deployment architecture has evolved significantly since the initial design. Neo4j was removed in March 2026 after all entity relationships were migrated to PostgreSQL taxonomy tables (taxonomy_entities and taxonomy_relationships) backed by @pgvector_docs, reducing the service count and simplifying operations — see ADR-0053 (master record). Keycloak was added as the OIDC identity provider (@openid_connect_core_1_0, @rfc6749_oauth2, @rfc7519_jwt), replacing the legacy cookie-based authentication to meet @gdpr_regulation Article 25 and @ai_act_regulation audit-trail requirements. The on-premise Ollama embedding container was retired entirely in April 2026 in favour of OpenAI's hosted text-embedding-3-large API (@openai2024embeddings, ADR-0048) — embedding latency dropped from 1.7–5.8 s (cold-start + serialization tax) to 150–211 ms per call, freeing ≈1.4 GB of RAM and removing two compose services. The single docker-compose.yml was refactored into a three-file overlay pattern for cleaner separation between infrastructure, application, and SSL concerns. Multi-tenant separation across the deployment follows the SaaS patterns of @bezemer2010multitenant; operational SLO and tail-latency reporting follow @beyer2016sre; compliance baselines are anchored in @iso27001_2022 and @iso27018_2019.