ADR-0053: Remove Neo4j — Consolidate Graph Context onto PostgreSQL Taxonomy

Date: 2026-03-07 (primary removal, commit d82b1592) — architectural cleanup completed 2026-05-02 (commit 158d793). Documented retroactively on: 2026-05-09 during the Pilot-Review Readiness Phase 2.A cascade, in response to drift register findings (audits/2026-05-09-adr-register).

Status: Accepted (retroactively documented) | Supersedes: ADR-006 (Knowledge Graph Enhancement), ADR-0029 Remove Graphiti — Direct Neo4j Driver | Amends: ADR-0017 Context Retrieval Architecture (Stage 2c deprecated), ADR-0028 Golden-Page Taxonomy (Neo4j seeding becomes a no-op), ADR-0030 LLM Entity Extraction (routes to PostgreSQL, not Neo4j Cypher).

Context

Why Neo4j was originally adopted

The hospital domain is naturally relational: a Doctor works in a Department, a Department handles Conditions, a Treatment is offered for a Condition. Three early ADRs leaned into this:

ADR-006 (2026-02-03) introduced typed Neo4j nodes (:Doctor, :Department, :Condition, :Treatment, :Examination, :Service, :Campus, :Hospital) with explicit relationship edges (WORKS_IN, HANDLES, OFFERS, TREATS).
ADR-0017 (2026-02-09) promoted Neo4j to "Stage 2c — Graph Search" of the hybrid retrieval pipeline, with Tier 1 typed-node Cypher queries and Tier 2 Graphiti semantic search.
ADR-0029 (2026-02-13) removed Graphiti the library but explicitly kept Neo4j the database, replacing Graphiti with a thin Neo4jService wrapping neo4j.AsyncDriver.

By February 2026, the production stack included PostgreSQL + pgvector (chunks, embeddings, BM25, taxonomy, sessions, analytics), Neo4j (typed-node knowledge graph), Redis (ephemeral state), and MinIO (binaries).

What changed between February and May 2026

Three independent forces eroded the Neo4j value:

The PostgreSQL taxonomy schema grew rich. Migrations 020-040 added the taxonomy_entity, taxonomy_relationship, taxonomy_alias, and medical_taxonomy tables — entity traversal that was originally Cypher-shaped now ran as JOIN queries against pgvector-indexed metadata. The "graph" was reproducible in SQL.
Graph contribution to ranking proved marginal. Conditional evals (graph on vs graph off, graph value assessment in backend/tests/evaluation/graph_value_assessment.py) showed Stage 2c contributed ≤1% of final ranking signal across the 299-question Golden Eval. The pipeline cost (additional async driver + connection pool + 80-120 ms p50 latency per turn) didn't pay for itself.
Operational cost was structural, not transient. Two databases meant two backup schedules, two replication topologies, two failure modes, and a hard cross-store consistency problem: an entity created in PG had to be reflected in Neo4j, and the reverse, with no transactional guarantee.

Decision

Remove Neo4j entirely. Consolidate all graph context onto PostgreSQL. Specifically:

Drop the Neo4jService and all callers.
Drop the Neo4j docker-compose service.
Move :Doctor, :Department, :Condition, etc. typed-entity logic into the PostgreSQL taxonomy tables (which already existed in parallel).
Replace Stage 2c (graph search) in the hybrid retrieval pipeline with a no-op pass-through.

The primary removal landed in commit d82b1592 (March 7, ~16 000 LOC deleted). The architectural cleanup — removing the residual feature flags, the use_graph_rag setting, the deprecated tests, the orphan migrations — completed in commit 158d793 (May 2).

The architectural distinction

The relational hospital domain is graph-shaped, but graph-shaped data does not require a graph database. PostgreSQL's recursive CTEs handle taxonomy traversal in well-understood query plans. The win of a dedicated graph database (Neo4j) comes from variable-depth traversal at scale (e.g., social-network friend-of-friend at depth 6+); the hospital taxonomy is naturally bounded at depth 2-3 (Doctor → Department → Condition), and PostgreSQL handles that depth without breaking a sweat.

Consequences

Positive

Operational simplicity. One database instead of two. One backup. One replica topology. One failure mode.
Transactional consistency. Entity creation and taxonomy updates are now in the same transaction. The PG ↔ Neo4j divergence class of bugs (impossible to detect except via differential audit) is gone.
Lower per-turn latency. Stage 2c contribution drops 80-120 ms (its full p50 cost).
Tractable mental model for new contributors. PostgreSQL is universally understood; Neo4j skills are scarce and the cost of onboarding to Cypher was real.

Negative / trade-offs

Variable-depth traversal capability removed. If the project ever needs depth-6+ entity traversal (currently no use case), it would need a graph layer reintroduced. The May audit confirmed: no observed query exceeds depth 3.
Three ADRs require amendment. ADR-0017 (Stage 2c), ADR-0028 (golden-page Neo4j seeding), ADR-0030 (entity-extraction routing) all needed amendment blocks. Done as part of Phase 2.A cascade.

Rejected alternatives

Keep Neo4j but only for entity-extraction routing. Considered. Rejected because the routing logic is entity_name → entity_id lookups — exactly the case PostgreSQL handles with one indexed equality predicate.
Migrate to a graph-on-relational extension (Apache AGE). Considered. Rejected because the value proposition is identical to running PG with taxonomy JOINs; the extension adds a Cypher dialect that nobody on the team writes natively.
Defer the decision and let Neo4j atrophy. Rejected because every month it sat there was a month of operational tax (backups, monitoring, the Graphiti-residue questions) and increasing cross-store drift risk.

References

Commit d82b1592 — primary removal (March 7, ~16 000 LOC deleted)
Commit 158d793 — architectural cleanup (May 2)
ADR-0029 (Remove Graphiti) — the precursor that removed the library but kept the database
ADR-0017 (Context Retrieval Architecture) — amended to mark Stage 2c as deprecated
Master ADR (full text): docs/ADR/0053-neo4j-removal-pgvector-consolidation.md

Context​

Why Neo4j was originally adopted​

What changed between February and May 2026​

Decision​

The architectural distinction​

Consequences​

Positive​

Negative / trade-offs​

Rejected alternatives​

References​