Skip to main content

Data Protection Impact Assessment (DPIA)

Regulatory basis

This DPIA is conducted under GDPR Art. 35 (Regulation (EU) 2016/679), which requires an assessment where processing is "likely to result in a high risk to the rights and freedoms of natural persons." While the ZOL Intelligent Search system processes publicly available hospital information rather than patient records, two factors invoke a proactive DPIA per WP29 Guidelines on DPIA (WP 248 rev.01): (a) the healthcare context (data concerning health is special-category data under GDPR Art. 9 even when nominally about hospitals rather than patients), and (b) the system's use of automated decision-making (intent classification) that influences how a user's query is handled. A proactive DPIA is the prudent posture and is encouraged by the Belgian Data Protection Authority's (GBA / APD) sectoral guidance.

1. System description

1.1 Purpose

The ZOL Intelligent Search system replaces keyword-based search on the hospital website with a natural-language interface. Users ask questions in Dutch (or other supported languages — see §1.5) and receive grounded, cited responses about hospital departments, doctors, conditions, treatments, and practical information.

1.2 Data controller

Ziekenhuis Oost-Limburg (ZOL), Schiepse Bos 6, 3600 Genk, Belgium. Data Protection Officer details are published on the hospital's website.

1.3 Processing scope

Data categorySourceContains personal data?Storage
User queriesWebsite visitors / phone callersPossible (users may include names, phone numbers, symptom descriptions)PostgreSQL (app.conversations, app.conversation_messages); voice-side redaction strips inline PII before structured-log emission
Hospital contentZOL public website + published brochuresYes (doctor names, department phones, emails — all public-domain)pgvector (app.document_chunks), entity taxonomy (app.taxonomy_entities, app.taxonomy_relationships), object storage (MinIO)
Response dataSystem-generatedNo patient PIIConversation history
Session metadataBrowser session / SIP callIP address, timestamps, language code, caller-ID-on-voicePostgreSQL, Redis (ephemeral cache)
AnalyticsAggregated query patternsNo (aggregated, anonymised)PostgreSQL
Audit logsSecurity-relevant events (auth, safety blocks, PII detections, GDPR deletions)Yes (user_id, IP)PostgreSQL (audit.logs, audit.data_access_logs)

1.4 Data flow

1.5 Languages and demographics

The system supports Dutch (primary), English, French, German, and additional patient-language fallbacks. ZOL serves a diverse population in Belgian Limburg including significant Turkish, Romanian, Italian, Greek, and Polish communities; multilingual capability is itself a fairness-driven design decision (see adversarial hardening multilingual coverage for the language list).

2. Necessity and proportionality (GDPR Art. 5)

2.1 Lawful basis

BasisArticleJustification
Public interestArt. 6(1)(e)Healthcare institutions have a public-service duty to make their services accessible. The system is a wayfinding tool that supports that duty.
Legitimate interestArt. 6(1)(f)Improving hospital information accessibility for patients and visitors; balanced against the negligible processing of personal data, the public-domain nature of the content, and voluntary user interaction. The interest passes the three-part test (purpose, necessity, balancing) per WP29 Opinion 06/2014.

The system does not rely on consent (Art. 6(1)(a)) because the processing is incidental to the user's voluntary act of using the search function; explicit consent would add friction without protection benefit and would not be the most appropriate basis under EDPB Guidelines 05/2020 on consent.

The system does not process special-category data (Art. 9) deliberately: the corpus contains general-public hospital information, not patient health records. Inadvertent special-category content in user queries (a caller mentioning their symptoms) is processed under Art. 9(2)(h) (provision of healthcare) and Art. 9(2)(i) (public health) jointly — the user is interacting with a hospital channel for a healthcare-adjacent purpose.

2.2 Necessity

The processing is necessary because:

  • ~25 000 monthly search queries demonstrate sustained user demand for better information access;
  • Keyword search demonstrably fails on natural-language questions (such as "Ik heb pijn in mijn borst, naar welke afdeling moet ik?");
  • The alternative (no AI assistance, keyword-only search) results in increased helpdesk load and frustrated patients — the operational rationale for the project;
  • No less intrusive means achieves the same accessibility improvement.

2.3 Proportionality

PrincipleArticleImplementation
Data minimisationArt. 5(1)(c)Only publicly available hospital content is indexed. No patient records, medical histories, or insurance data enters the system. Voice transcripts are PII-redacted before reaching structured logs (backend/app/services/voice/voice_pii_redaction.py).
Purpose limitationArt. 5(1)(b)Queries are used exclusively for response generation. No secondary use for marketing, profiling, or research.
Storage limitationArt. 5(1)(e)Per-data-class retention documented in the Data Retention Policy. Audit logs auto-expire; Redis cache is ephemeral.
AccuracyArt. 5(1)(d)Source citations enable users to verify every claim against the original hospital content. RAG grounding constrains generation to retrieved evidence.

3. Risk assessment

3.1 Identified risks

RiskLikelihoodSeverityInherent riskMitigationResidual risk
R1: System provides medical adviceLowCriticalHIGHFive-layer safety architecture (see Safety Architecture); zero-incident KPI; automated monitoringLOW
R2: User PII in queries forwarded to LLM providerMediumMediumMEDIUMPII detection + audit logging; PII-containing queries excluded from semantic cache; OpenAI DPA in forceLOW
R3: LLM hallucination produces incorrect hospital infoMediumMediumMEDIUMSource grounding (RAG); citation verification; quality gate at calibrated similarity threshold (per ADR-0048)LOW
R4: Query-data breach (conversation history)LowMediumLOWTLS in transit (RFC 8446); PostgreSQL volume encryption at rest; Keycloak-mediated access controlLOW
R5: Re-identification from aggregated analyticsVery lowLowLOWAnalytics pre-aggregated; no individual query trackingVERY LOW
R6: Discrimination via language-based quality gapsLowMediumMEDIUMMultilingual support with translated safety messages; ongoing per-language evaluationLOW
R7: Adversarial-input safety bypass (GCG suffix attacks)LowCriticalMEDIUMPerplexity-based anomaly detector (ADR-0036); LLM-as-judge defence-in-depth; see Adversarial HardeningLOW
R8: Voice-side PII surfaces in logsMediumMediumMEDIUMvoice_pii_redaction.py strips Belgian phone numbers, names, and DOB patterns before log emission; SHA-256 hashing for audit-trail correlation without plaintext retentionLOW

3.2 Risk details

R1: Medical advice (critical)

The most significant risk: a response that could be interpreted as medical advice could cause patient harm. The system implements five independent safety layers (see Safety Architecture):

  1. Intent classification blocks medical-advice queries before retrieval
  2. Post-generation regex scans for Dutch medical-advice patterns
  3. LLM-as-judge validates response safety
  4. Quality gate blocks low-confidence responses
  5. Mandatory disclaimer appended to every response

Evaluation evidence: 100 % safety-refusal accuracy across the golden evaluation question set (see Quality Evaluation for the current methodology and results), including dedicated safety-refusal tests and adversarial prompt-injection attempts (see Adversarial Hardening).

R2: PII in query forwarding

Users may include personal information in queries (phone numbers, names). This data flows to the LLM provider (OpenAI) for response generation and embedding.

Mitigations:

  • PII detection layer flags and logs PII occurrences (regex patterns covering Belgian formats — see PII Protection)
  • PII-containing queries excluded from semantic cache (so the same PII string is not retained)
  • OpenAI's Data Processing Addendum covers GDPR Art. 28 (processor) obligations
  • No PII is stored in the knowledge graph or vector store
  • Voice channel: voice_pii_redaction.py strips PII patterns before structured-log emission, satisfying GDPR Art. 5(1)(c) data-minimisation by-design at the log layer

R6: Language discrimination

The system targets a diverse-population catchment area; non-Dutch queries must achieve quality parity with Dutch queries to avoid indirect discrimination under GDPR recital 71 (automated decision-making fairness).

Mitigations:

  • Multilingual support with translated safety messages
  • Query rewriting normalises non-Dutch input
  • Ongoing evaluation tracks per-language quality metrics
  • Helpdesk fallback (089 32 50 50) remains available regardless of system performance

R7: Adversarial-input safety bypass

Documented threat model: Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al., 2023) demonstrated that GCG-style adversarial suffixes bypass standard LLM safety alignment. For a healthcare safety architecture, an attack that bypasses intent classification can trigger the medical-advice failure class.

Mitigations: See Adversarial Hardening. Perplexity-based anomaly detection runs in < 5 ms before any LLM call and blocks adversarial suffixes with high recall.

See Zou et al. 2023 GCG attacks.

4. Data-subject rights (GDPR Chapter III)

RightArticleImplementation
InformationArt. 13–14Privacy notice on hospital website; system identifies itself as AI-powered search per AI Act Art. 50
AccessArt. 15Conversation history available via authenticated session; admin can export per-user data via the GDPR endpoint
RectificationArt. 16Hospital content errors corrected at source; system re-ingests on next crawl
ErasureArt. 17DELETE /api/v1/gdpr/users/{user_id}/data (admin-authenticated) cascades through app.conversations, app.conversation_messages, app.feedback, app.analytics_events, audit.logs, audit.data_access_logs. Implementation: backend/app/api/gdpr.py.
RestrictionArt. 18Processing can be paused per-user via rate-limit override or admin block
Data portabilityArt. 20Conversation export in JSON via API
ObjectionArt. 21Users opt out by not using the search function; traditional phone/email channels remain available
Automated decision-makingArt. 22Intent classification is automated but does not produce legal effects or similarly significantly affect the data subject (the test under Art. 22(1)). Blocked queries receive a helpful redirect, not a denial of service.

5. Technical and organisational measures (GDPR Art. 32)

5.1 Technical measures

MeasureArticle anchorImplementation
Encryption at restArt. 32(1)(a)PostgreSQL volume encryption
Encryption in transitArt. 32(1)(a)TLS 1.2+ for all API and WebSocket connections; SIPS for telephony signalling per ADR-0050
Pseudonymisation of audit dataArt. 4(5), Art. 32(1)(a)Voice transcripts are PII-redacted at log boundary; SHA-256 hashing for audit-trail correlation
Access controlArt. 32(1)(b)Keycloak OIDC authentication (@owasp_llm_top10 LLM06 mitigation by delegating identity to an audited IdP) with JWT validation; role-based authorisation (user / admin)
Network isolationArt. 32(1)(b)Docker bridge networks; no direct database exposure
Audit loggingArt. 32(1)(d)Structured audit trail for all security-relevant events; see Data Retention Policy for retention
MonitoringArt. 32(1)(d)Prometheus metrics; health-check endpoints; per-turn voice telemetry
BackupArt. 32(1)(c)PostgreSQL daily backups with point-in-time recovery

5.2 Organisational measures

MeasureStatus
Data Processing Agreement with OpenAIIn force (OpenAI DPA)
Security incident response procedureDocumented in deployment runbook
Regular access reviewAdmin accounts reviewed quarterly
Privacy training for hospital staffRecommended for helpdesk team
ISO/IEC 27001 alignment (target, not certification)Posture being aligned to ISO/IEC 27001:2022 controls; certification not currently held

See ISO/IEC 27001:2022.

6. Consultation

StakeholderConsultation
Hospital DPOReview of this DPIA required before production deployment per Art. 35(2)
IT SecurityInfrastructure review of production deployment
Communication teamReview of user-facing privacy notices
PXL UniversityAcademic supervision of ethical considerations
Belgian Data Protection Authority (GBA / APD)Prior consultation under Art. 36 not anticipated; system risk classification is LOW–MEDIUM after mitigation

7. Conclusion

The ZOL Intelligent Search system processes minimal personal data (user queries and session metadata) to achieve a legitimate and proportionate goal (improving hospital information accessibility). The identified risks are effectively mitigated through:

  • A five-layer safety architecture with zero medical-advice incidents to date
  • PII detection, audit logging, and voice-side redaction
  • Source-grounded responses with mandatory citations
  • Per-data-class retention limits documented in the Data Retention Policy
  • Role-based access control (Keycloak OIDC) and encryption in transit and at rest
  • Adversarial-input hardening per ADR-0036

DPIA outcome: The residual risks are acceptable given the implemented mitigations. The system may proceed to pilot deployment subject to:

  1. Review and approval by the hospital's Data Protection Officer per Art. 35(2);
  2. Continued operation of the OpenAI Data Processing Agreement under Art. 28;
  3. Publication of a user-facing privacy notice referencing this DPIA's outcome.

8. Review schedule

This DPIA shall be reviewed:

  • Before any significant change to data-processing activities (per Art. 35(11));
  • When new data categories are introduced (e.g., patient portal integration);
  • At minimum annually from the date of production deployment;
  • Following any data-protection incident, regardless of notification threshold under Art. 33–34.

Document version: 2.0 — Wave 2.D academic-rewrite revision | Date: 2026-05-10 | Author: SOFT4U BV

References