Monitoring
Access Grafana dashboards, Prometheus metrics, and application logs to monitor system health and performance.
Checklist
- Access Grafana at
http://YOUR_SERVER:3000 - Verify dashboards are loading
- Understand the health endpoint
- Know how to view application logs
- Set up alert rules (optional)
Grafana Dashboards
Open http://YOUR_SERVER_IP:3000 in a browser.
Login credentials:
- Username:
admin - Password: the
GRAFANA_ADMIN_PASSWORDfrom.env.prod
Available Dashboards
Seven dashboards ship with the deployment under Dashboards → Browse. The System Overview is an executive index with clickable links into the five specialist dashboards; the SLO dashboard is the stakeholder view.
| Dashboard | UID | Purpose |
|---|---|---|
| ZOL RAG - System Overview | zol-rag-system-overview | Executive index — 8 stat panels (status, request rate, p95, error %, LLM spend today, ingest status, voice TTFT p95, refusal rate) with clickable links to the specialist dashboards |
| ZOL RAG - Pipeline Overview | zol-rag-pipeline-overview | RAG behavioural metrics — intent distribution, stage-latency breakdown, safety refusal rate, graph injections |
| ZOL RAG - Infrastructure Health | zol-rag-infrastructure-health | HTTP plumbing, process resources, vector search latency, Python GC |
| ZOL RAG - LLM & Cost Tracking | zol-rag-llm-cost-tracking | Authoritative Postgres-backed daily / weekly / monthly cost panels, plus Prometheus since-restart token + cost counters |
| ZOL RAG - Voice Channel | zol-rag-voice-channel | Voice TTFT p50 / p95 / p99, safety escalations by reason, LLM-judge per-dimension scores, speculative-STT hit rate + latency saved |
| ZOL RAG - Ingest Pipeline | zol-rag-ingest-pipeline | 100% Postgres-backed — latest-run status + history, crawl corpus state, failure-class distribution, failed-URL table |
| ZOL RAG - Safety & Compliance | zol-rag-safety-compliance | Stakeholder view tied to the ZERO medical-advice incidents SLO — refusals + voice escalations, refusal-rate %, citation-attached %, CRAG decisions |
| ZOL RAG - SLO Status | zol-rag-slo-dashboard | Six headline SLO stats (availability, 5xx rate, RAG p95, voice TTFT p95, LLM error rate, medical-advice incidents) with red/yellow/green thresholds + error-budget panels |
Dashboards are provisioned automatically from grafana/dashboards/.
Postgres-Backed Panels
Several panels on the LLM & Cost Tracking, Ingest Pipeline, and Safety & Compliance dashboards do not use Prometheus — they query the application database directly through the Grafana postgres datasource. This is intentional: Prometheus counters reset on container restart (you lose yesterday's cumulative spend), while Postgres tables like app.analytics_events, app.ingest_runs, and app.crawled_urls are restart-safe and authoritative for cost reporting, ingest history, and audit numbers.
The Postgres datasource is provisioned from grafana/datasources/prometheus.yml (yes, the filename is prometheus.yml but it declares both datasources). Key gotcha: the database: zol_rag key MUST live under jsonData:, not at the top level of the datasource block — see operations/telemetry-and-runbooks.md for the debug story.
Alerting
Six Prometheus alert rules ship in grafana/provisioning/alerting/zol-rag-alerts.yml under group zol-rag-core:
| Rule | Severity | Condition |
|---|---|---|
BackendDown | critical | up == 0 for 1m |
HighErrorRate | critical | 5xx ratio > 1% for 5m |
LLMCostBurnRate | warning | burn > $5/hr for 10m |
SafetyRefusalSpike | warning | 5x the 1h baseline |
VoiceTTFTHigh | warning | voice TTFT p95 > 2000ms for 10m |
LLMCircuitOpen | critical | LLM error rate > 20% for 5m (proxy — no dedicated circuit-state gauge exists yet) |
Before pilot deploy: contact-points.yml and notification-policies.yml in the same directory ship as templates. Production ops MUST set the real email recipients and/or Slack webhook URL before any alert can route. The volume mounts in docker/docker-compose.infra.yml and docker/docker-compose.yml pick the files up on Grafana container restart.
Health Endpoints
The application exposes two health check endpoints:
Basic Health (/health)
curl -s http://localhost:80/health | python3 -m json.tool
{
"status": "healthy",
"version": "0.1.0",
"components": {
"database": "healthy",
"redis": "healthy",
"minio": "healthy"
}
}
| Status | Meaning |
|---|---|
healthy | All components operational |
degraded | Some components have issues but service is available |
unhealthy | Critical component failure |
Docker health check runs curl -f http://localhost:80/health every 30 seconds.
Deep Readiness (/health/ready)
The deep health check includes LLM circuit breaker state, making it suitable for orchestrator readiness probes:
curl -s http://localhost:80/health/ready | python3 -m json.tool
{
"status": "healthy",
"version": "0.1.0",
"components": {
"database": "healthy",
"redis": "healthy",
"minio": "healthy",
"llm_circuit": "closed"
}
}
| LLM Circuit State | Meaning |
|---|---|
closed | LLM API is reachable and functioning normally |
open | LLM API is unreachable; requests are failing over to fallback |
half-open | Circuit breaker is testing whether the LLM API has recovered |
Use /health/ready for Kubernetes/Docker readiness probes to detect LLM outages.
Prometheus Metrics
The backend exposes metrics at /metrics (Prometheus text format). Headline metrics:
| Metric | Type | Description |
|---|---|---|
zol_rag_requests_total | Counter | Total HTTP requests by method, path, status |
zol_rag_request_latency_seconds | Histogram | Request latency distribution |
zol_rag_query_latency_seconds | Histogram | RAG query processing time, segmented by channel (web / voice_sip / voice_browser) |
zol_rag_queries_total | Counter | Total RAG queries by intent and channel |
zol_rag_llm_requests_total | Counter | LLM API calls by model, status |
zol_rag_llm_cost_usd_total | Counter | Cumulative LLM spend per model (Prometheus side; reset on restart — see Postgres-backed panels above) |
zol_rag_safety_refusals_total | Counter | Safety-blocked answer rate by reason |
rag_query_ttft_ms | Histogram | Voice time-to-first-token (the headline voice SLO) |
rag_voice_safety_escalations_total | Counter | Voice-channel safety escalations by reason |
Voice-channel metrics use the rag_* prefix; application metrics use the zol_rag_* prefix. Most counter and histogram metrics carry a channel label so dashboards can split web traffic from voice traffic.
For the full metric catalog including labels and semantics, see operations/telemetry-and-runbooks.md. Prometheus scrapes the backend every 15 seconds.
Structured Logging
The backend uses structlog for structured logging. Log format is environment-aware:
| Environment | Format | Example |
|---|---|---|
| Development | Colored console with key-value pairs | [info] query processed intent=doctor_lookup latency=1.2s |
| Production | JSON lines (one object per log entry) | {"event":"query processed","intent":"doctor_lookup","latency":1.2} |
JSON log output in production is compatible with standard log aggregation tools (ELK, Loki, CloudWatch).
Viewing Logs
# Application logs (backend + nginx)
docker logs zol-app --tail 100 -f
# All infrastructure logs
docker compose -f docker/docker-compose.infra.yml logs --tail 50
# Specific service logs
docker logs zol-postgres --tail 50
docker logs zol-keycloak --tail 50
docker logs zol-redis --tail 50
# Export logs for analysis
docker logs zol-app --since 2h > /tmp/app-logs.txt 2>&1
Log Rotation
All containers use Docker's json-file log driver with rotation:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
Maximum log storage per container: 50 MB.
Additional infrastructure alerts (not yet provisioned)
The Alerting section above describes the six application-level rules that ship in grafana/provisioning/alerting/zol-rag-alerts.yml. The following infrastructure-level alerts are recommended but not yet provisioned — add them to the YAML in a follow-up pass when ops capacity allows.
| Alert | Condition | Severity | Why it's worth adding |
|---|---|---|---|
| DiskSpaceLow | Available disk on pilot < 10% | Warning | PostgreSQL + MinIO + Prometheus all fail noisily when the host volume fills |
| PostgresConnExhausted | Active connections > 80% of max_connections | Warning | pgvector is tolerant but the rest of the app deadlocks under exhaustion |
| EmbeddingLatencyHigh | histogram_quantile(0.95, rate(zol_rag_embedding_latency_seconds_bucket[5m])) > 2 | Warning | Embeddings now run against the OpenAI API (Ollama retired April 2026, ADR-0048); a sustained p95 spike signals OpenAI slowness/errors-with-retries and lags ingest + voice retrieval. (zol_rag_embedding_latency_seconds is the only exported embedding metric — there is no error counter.) |
| RedisOOM | Memory usage > 90% of maxmemory | Warning | Cache flapping causes cascade LLM cost spike |
Container Health Overview
Quick check of all containers:
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
Next: Updates & Releases →