Monitoring

Access Grafana dashboards, Prometheus metrics, and application logs to monitor system health and performance.

Checklist

Access Grafana at http://YOUR_SERVER:3000
Verify dashboards are loading
Understand the health endpoint
Know how to view application logs
Set up alert rules (optional)

Grafana Dashboards

Open http://YOUR_SERVER_IP:3000 in a browser.

Login credentials:

Username: admin
Password: the GRAFANA_ADMIN_PASSWORD from .env.prod

Available Dashboards

Seven dashboards ship with the deployment under Dashboards → Browse. The System Overview is an executive index with clickable links into the five specialist dashboards; the SLO dashboard is the stakeholder view.

Dashboard	UID	Purpose
ZOL RAG - System Overview	`zol-rag-system-overview`	Executive index — 8 stat panels (status, request rate, p95, error %, LLM spend today, ingest status, voice TTFT p95, refusal rate) with clickable links to the specialist dashboards
ZOL RAG - Pipeline Overview	`zol-rag-pipeline-overview`	RAG behavioural metrics — intent distribution, stage-latency breakdown, safety refusal rate, graph injections
ZOL RAG - Infrastructure Health	`zol-rag-infrastructure-health`	HTTP plumbing, process resources, vector search latency, Python GC
ZOL RAG - LLM & Cost Tracking	`zol-rag-llm-cost-tracking`	Authoritative Postgres-backed daily / weekly / monthly cost panels, plus Prometheus since-restart token + cost counters
ZOL RAG - Voice Channel	`zol-rag-voice-channel`	Voice TTFT p50 / p95 / p99, safety escalations by reason, LLM-judge per-dimension scores, speculative-STT hit rate + latency saved
ZOL RAG - Ingest Pipeline	`zol-rag-ingest-pipeline`	100% Postgres-backed — latest-run status + history, crawl corpus state, failure-class distribution, failed-URL table
ZOL RAG - Safety & Compliance	`zol-rag-safety-compliance`	Stakeholder view tied to the ZERO medical-advice incidents SLO — refusals + voice escalations, refusal-rate %, citation-attached %, CRAG decisions
ZOL RAG - SLO Status	`zol-rag-slo-dashboard`	Six headline SLO stats (availability, 5xx rate, RAG p95, voice TTFT p95, LLM error rate, medical-advice incidents) with red/yellow/green thresholds + error-budget panels

Dashboards are provisioned automatically from grafana/dashboards/.

Postgres-Backed Panels

Several panels on the LLM & Cost Tracking, Ingest Pipeline, and Safety & Compliance dashboards do not use Prometheus — they query the application database directly through the Grafana postgres datasource. This is intentional: Prometheus counters reset on container restart (you lose yesterday's cumulative spend), while Postgres tables like app.analytics_events, app.ingest_runs, and app.crawled_urls are restart-safe and authoritative for cost reporting, ingest history, and audit numbers.

The Postgres datasource is provisioned from grafana/datasources/prometheus.yml (yes, the filename is prometheus.yml but it declares both datasources). Key gotcha: the database: zol_rag key MUST live under jsonData:, not at the top level of the datasource block — see operations/telemetry-and-runbooks.md for the debug story.

Alerting

Six Prometheus alert rules ship in grafana/provisioning/alerting/zol-rag-alerts.yml under group zol-rag-core:

Rule	Severity	Condition
`BackendDown`	critical	`up == 0` for 1m
`HighErrorRate`	critical	5xx ratio > 1% for 5m
`LLMCostBurnRate`	warning	burn > $5/hr for 10m
`SafetyRefusalSpike`	warning	5x the 1h baseline
`VoiceTTFTHigh`	warning	voice TTFT p95 > 2000ms for 10m
`LLMCircuitOpen`	critical	LLM error rate > 20% for 5m (proxy — no dedicated circuit-state gauge exists yet)

Before pilot deploy: contact-points.yml and notification-policies.yml in the same directory ship as templates. Production ops MUST set the real email recipients and/or Slack webhook URL before any alert can route. The volume mounts in docker/docker-compose.infra.yml and docker/docker-compose.yml pick the files up on Grafana container restart.

Health Endpoints

The application exposes two health check endpoints:

Basic Health (`/health`)

curl -s http://localhost:80/health | python3 -m json.tool

{
  "status": "healthy",
  "version": "0.1.0",
  "components": {
    "database": "healthy",
    "redis": "healthy",
    "minio": "healthy"
  }
}

Status	Meaning
`healthy`	All components operational
`degraded`	Some components have issues but service is available
`unhealthy`	Critical component failure

Docker health check runs curl -f http://localhost:80/health every 30 seconds.

Deep Readiness (`/health/ready`)

The deep health check includes LLM circuit breaker state, making it suitable for orchestrator readiness probes:

curl -s http://localhost:80/health/ready | python3 -m json.tool

{
  "status": "healthy",
  "version": "0.1.0",
  "components": {
    "database": "healthy",
    "redis": "healthy",
    "minio": "healthy",
    "llm_circuit": "closed"
  }
}

LLM Circuit State	Meaning
`closed`	LLM API is reachable and functioning normally
`open`	LLM API is unreachable; requests are failing over to fallback
`half-open`	Circuit breaker is testing whether the LLM API has recovered

Use /health/ready for Kubernetes/Docker readiness probes to detect LLM outages.

Prometheus Metrics

The backend exposes metrics at /metrics (Prometheus text format). Headline metrics:

Metric	Type	Description
`zol_rag_requests_total`	Counter	Total HTTP requests by method, path, status
`zol_rag_request_latency_seconds`	Histogram	Request latency distribution
`zol_rag_query_latency_seconds`	Histogram	RAG query processing time, segmented by `channel` (web / voice_sip / voice_browser)
`zol_rag_queries_total`	Counter	Total RAG queries by intent and channel
`zol_rag_llm_requests_total`	Counter	LLM API calls by model, status
`zol_rag_llm_cost_usd_total`	Counter	Cumulative LLM spend per model (Prometheus side; reset on restart — see Postgres-backed panels above)
`zol_rag_safety_refusals_total`	Counter	Safety-blocked answer rate by reason
`rag_query_ttft_ms`	Histogram	Voice time-to-first-token (the headline voice SLO)
`rag_voice_safety_escalations_total`	Counter	Voice-channel safety escalations by reason

Voice-channel metrics use the rag_* prefix; application metrics use the zol_rag_* prefix. Most counter and histogram metrics carry a channel label so dashboards can split web traffic from voice traffic.

For the full metric catalog including labels and semantics, see operations/telemetry-and-runbooks.md. Prometheus scrapes the backend every 15 seconds.

Structured Logging

The backend uses structlog for structured logging. Log format is environment-aware:

Environment	Format	Example
Development	Colored console with key-value pairs	`[info] query processed intent=doctor_lookup latency=1.2s`
Production	JSON lines (one object per log entry)	`{"event":"query processed","intent":"doctor_lookup","latency":1.2}`

JSON log output in production is compatible with standard log aggregation tools (ELK, Loki, CloudWatch).

Viewing Logs

# Application logs (backend + nginx)
docker logs zol-app --tail 100 -f

# All infrastructure logs
docker compose -f docker/docker-compose.infra.yml logs --tail 50

# Specific service logs
docker logs zol-postgres --tail 50
docker logs zol-keycloak --tail 50
docker logs zol-redis --tail 50

# Export logs for analysis
docker logs zol-app --since 2h > /tmp/app-logs.txt 2>&1

Log Rotation

All containers use Docker's json-file log driver with rotation:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "5"

Maximum log storage per container: 50 MB.

Additional infrastructure alerts (not yet provisioned)

The Alerting section above describes the six application-level rules that ship in grafana/provisioning/alerting/zol-rag-alerts.yml. The following infrastructure-level alerts are recommended but not yet provisioned — add them to the YAML in a follow-up pass when ops capacity allows.

Alert	Condition	Severity	Why it's worth adding
DiskSpaceLow	Available disk on pilot < 10%	Warning	PostgreSQL + MinIO + Prometheus all fail noisily when the host volume fills
PostgresConnExhausted	Active connections > 80% of `max_connections`	Warning	pgvector is tolerant but the rest of the app deadlocks under exhaustion
EmbeddingLatencyHigh	`histogram_quantile(0.95, rate(zol_rag_embedding_latency_seconds_bucket[5m])) > 2`	Warning	Embeddings now run against the OpenAI API (Ollama retired April 2026, ADR-0048); a sustained p95 spike signals OpenAI slowness/errors-with-retries and lags ingest + voice retrieval. (`zol_rag_embedding_latency_seconds` is the only exported embedding metric — there is no error counter.)
RedisOOM	Memory usage > 90% of `maxmemory`	Warning	Cache flapping causes cascade LLM cost spike

Container Health Overview

Quick check of all containers:

docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

Next: Updates & Releases →

Checklist​

Grafana Dashboards​

Available Dashboards​

Postgres-Backed Panels​

Alerting​

Health Endpoints​

Basic Health (/health)​

Deep Readiness (/health/ready)​

Prometheus Metrics​

Structured Logging​

Viewing Logs​

Log Rotation​

Additional infrastructure alerts (not yet provisioned)​

Container Health Overview​