Skip to main content

Release Notes: Feb 28 – Mar 6, 2026

~205 commits | First pilot deployment live with SSL, Keycloak, and anonymous chat

2-Minute Summary

This week took the system from a local development setup to a live pilot server with production infrastructure. A full deployment toolkit was built (deploy, cleanup, user management scripts), Keycloak OIDC replaced the legacy auth system while maintaining backward compatibility, universal medical knowledge was consolidated out of hospital-specific config, and anonymous public queries were enabled for unauthenticated visitors. The server was hardened, SSL configured, and WebSocket reliability issues were resolved through multiple iteration rounds.


Pilot Server Deployment

The deployment pipeline was built from scratch to support one-command deployments to pilot and production environments.

What changed:

  • deploy.sh: Single-command deployment script supporting both pilot and production targets with environment-specific configuration
  • cleanup.sh: Automated removal of old Docker images and build cache to reclaim disk space on the server
  • Two-image Docker split: Separated into a base image (dependencies, rarely rebuilt) and an app image (code, fast rebuilds) — cutting deploy times significantly
  • Self-signed SSL + Let's Encrypt: Initial self-signed certificates for immediate HTTPS, with Let's Encrypt support for production
  • Server hardening script: Binds Grafana to localhost, restricts exposed ports, applies firewall rules
  • manage-users.sh: CLI for user lifecycle — create, list, delete, activate, deactivate, and set-role operations

Keycloak Dual-Mode Auth

The authentication system was migrated from a simple JWT-based flow to Keycloak OIDC while preserving backward compatibility for existing sessions.

What changed:

  • Keycloak OIDC config: Full realm configuration with client credentials, redirect URIs, and role mappings
  • JWKS validator: RS256 token validation using Keycloak's JSON Web Key Set endpoint
  • Dual-mode middleware: Requests are first validated against Keycloak (RS256); if that fails, the legacy HS256 secret is tried — enabling a seamless migration window
  • OIDC endpoints: Login, callback, logout, and refresh token endpoints integrated into the FastAPI auth router
  • ZOL realm: Configured with three roles (admin, manager, user) and a dedicated client for the RAG application

Universal Medical Knowledge

Medical safety knowledge was decoupled from hospital-specific configuration to support multi-tenancy.

What changed:

  • Consolidated universal knowledge: Medical conditions, safety rules, and disclaimer templates extracted from hardcoded ZOL config into shared, hospital-agnostic modules
  • Seeded universal conditions as graph nodes: Common medical conditions added to the knowledge graph with universal provenance markers
  • Provenance-aware graph context: Graph retrieval now tags results with provenance (universal vs. hospital-specific) and attaches a soft disclaimer for universal content
  • PromptContext from tenant config: All LLM prompts receive hospital identity from a single PromptContext dataclass — the single source of truth for prompt parameterization

Database Doctor Remediation

Data quality issues discovered during initial pilot testing were systematically resolved.

What changed:

  • CMS artifact stripping: Removed Drupal markup, inline styles, and broken HTML entities from ingested content
  • Condition deduplication: Merged duplicate condition entities created by overlapping extraction runs
  • Hernia misrouting fix: Corrected taxonomy relationships that were routing hernia queries to incorrect departments
  • Golden eval ground truth corrections: Updated expected answers in the evaluation benchmark to reflect verified correct responses

Deployment Fixes

Multiple rounds of iteration were needed to achieve reliable operation on the pilot server.

What changed:

  • Nginx WebSocket fix: Removed http2 directive from Nginx — HTTP/2 breaks WebSocket upgrade handshakes
  • Login loop resolution: Fixed redirect URI mismatch between Keycloak callback and frontend routing that caused infinite login loops
  • WebSocket reliability on SSL: Corrected wss:// proxy configuration and connection timeout handling
  • CSRF token header integration: Added CSRF token propagation through the Keycloak OIDC flow
  • supervisord privilege drop: Ensured worker processes run as non-root after container startup
  • Migration stamping: Stamped existing tables on the pilot database so Alembic could track them without attempting recreation

Anonymous Public Queries

Unauthenticated visitors can now use the search without logging in, enabling the primary use case: public hospital website search.

What changed:

  • Nullable user_id: Conversation.user_id column made nullable (migration) to support anonymous conversations
  • Session-based tracking: Anonymous conversations are tracked via a session_id instead of a user foreign key
  • Public WebSocket handler: The WS query endpoint passes user_id=None for unauthenticated connections, allowing full RAG pipeline access without auth

Think Harder

An escalation mode for complex queries that require deeper reasoning.

What changed:

  • Escalated intent classification: Queries classified as requiring multi-step reasoning are automatically flagged for escalation
  • GPT-5.2 direct routing: Think Harder mode bypasses the standard model and routes directly to OpenAI GPT-5.2 for higher-quality responses
  • Rate limiting: Capped at 2 Think Harder invocations per hour per user to manage API costs

System State at End of Sprint

ComponentValue
Pilot serverLive at 88.99.184.57 with SSL
AuthKeycloak OIDC + legacy dual-mode
Anonymous accessEnabled
Docker images2-image split (base + app)
DeploymentOne-command via deploy.sh
Think HarderGPT-5.2, rate-limited 2/hr
Medical advice incidentsZERO