Skip to main content

Graph-Enhanced RAG

Graph-Enhanced RAG extends the canonical Retrieval Augmented Generation pattern (Lewis et al., 2020) by integrating knowledge graph queries into the retrieval phase. This integration -- formalised as HybridRAG by Sarmah et al. (2024) -- enables the system to answer questions that require both structured entity knowledge and unstructured textual context, a combination that neither vector search nor graph queries can achieve independently. Peng et al. (2025) provide a comprehensive survey of Graph-Based Indexing, Graph-Guided Retrieval, and Graph-Enhanced Generation approaches.

The Integration Architecture

Taxonomy Entity Queries

For entity-specific queries (doctor lookups, department information), the system generates SQL queries against the PostgreSQL taxonomy tables. These queries are deterministic and fast (~10-50ms):

Doctor Queries

When a doctor name is detected in the query, the system:

  1. Fuzzy-matches the name against Doctor nodes (SequenceMatcher similarity)
  2. Traverses WORKS_IN edges to find departments (with optional schedule properties)
  3. Derives campus presence from department LOCATED_AT relationships
  4. Reads specialty from the Doctor node's specialty property

The result is a structured profile: "Dr. Van den Berg is orthopedisch chirurg in de afdeling Orthopedie. Hij consulteert op campus Sint-Jan (maandag, woensdag) en campus André Dumont (dinsdag)." Consultation schedule data (days, status, contacts) is stored as properties on the WORKS_IN relationship.

Department Queries

When a department is identified:

  1. Match the Department node
  2. Traverse LOCATED_AT edges for campus information
  3. Traverse inverse WORKS_IN edges for associated doctors
  4. Traverse HANDLES edges for conditions handled
  5. Traverse OFFERS edges for available treatments
  6. Traverse PERFORMS edges for examinations performed

Condition Queries

When a condition is identified (e.g., "Welke onderzoeken voor hartfalen?"):

  1. Match the Condition node via taxonomy alias resolution
  2. Traverse inverse HANDLES edges to find departments that handle this condition
  3. Traverse DIAGNOSES edges to find examinations used to diagnose this condition
  4. Traverse inverse TREATS edges to find treatments for this condition
  5. Expand into associated doctors and campus locations

The DIAGNOSES and TREATS relationships are powered by the Medical Knowledge Architecture — universal medical knowledge generated by LLM classification and merged with hospital-specific hub page data.

Treatment Queries

When a treatment is identified:

  1. Match the Treatment node via taxonomy alias resolution
  2. Traverse inverse OFFERS edges to find departments that offer this treatment
  3. Traverse TREATS edges to find conditions this treatment addresses
  4. Traverse DIAGNOSES edges to find diagnostic examinations for those conditions
  5. Expand into doctors and campus locations

Campus Queries

Campus-specific queries traverse from the Campus node outward:

  1. Match the Campus node (one of four: Sint-Jan, André Dumont, Sint-Barbara, Maas en Kempen)
  2. Traverse inverse LOCATED_AT edges for departments at this campus
  3. Traverse inverse WORKS_AT_CAMPUS edges for doctors present at this campus

LLM Entity Extraction for Graph Routing

When a user query uses colloquial or indirect language rather than naming entities explicitly, the LLM intent classifier extracts structured medical entities alongside the intent classification. This eliminates the need for a separate semantic search tier.

In this example, the patient does not mention "cardiologie" or any specific doctor. The LLM extracts "hartproblemen" as a condition, the taxonomy resolves it to "Hartfalen", and a HANDLES relationship query identifies that Cardiologie handles this condition. The system then uses SQL queries against the taxonomy tables to expand into doctors, treatments, and campus locations. See ADR-0030 for the rationale.

Result Merging Strategy

When both graph and vector results are available (HYBRID mode), the merger follows a priority-based approach:

Merge Rules

  1. Graph results are placed first in the merged result set to prioritize structured entity data (no numeric score boost is applied)
  2. Deduplication by source: If both sources reference the same page, keep the higher-scored entry and merge metadata
  3. Graph entities are formatted as context: Doctor profiles, department descriptions, and consultation schedules are rendered as natural language for the LLM's context window
  4. Vector chunks provide supporting detail: Textual content from brochures and web pages supplements the structured entity data

Campus-Specific Data

The four ZOL campuses have distinct service profiles, making campus-aware retrieval essential:

CampusKey ServicesNotable Departments
Sint-Jan (Genk)Main campus, most specialtiesEmergency, Surgery, Cardiology
André Dumont (Genk)Specialized careOncology, Rehabilitation
Sint-Barbara (Lanaken)Regional careGeneral Medicine, Geriatrics
Maas en Kempen (Maaseik)Regional careOutpatient Clinics

Consultation schedules are stored as properties on the Doctor -> WORKS_IN -> Department relationship (schedule, status, contacts), enabling queries like "When does Dr. Peeters consult at Sint-Jan?" to be answered with precise schedule information derived from the department's campus location.

Context Augmentation

The final step before response generation is context augmentation -- formatting the merged results into a structured prompt for the LLM:

  1. Entity section: Structured data from graph queries (doctor profiles, department info)
  2. Content section: Relevant text chunks from vector search
  3. Source section: URLs for citation in the response
  4. Instructions: Grounding rules, safety constraints, language requirements (Dutch)

This structured context enables the Tier 2 (standard) or Tier 3 (full mode) model to generate responses that seamlessly weave structured entity information (from the graph) with detailed explanatory content (from vector search), creating responses that are both precise and informative.

References