Data Protection Impact Assessment (DPIA)

Regulatory basis

This DPIA is conducted under GDPR Art. 35 (Regulation (EU) 2016/679), which requires an assessment where processing is "likely to result in a high risk to the rights and freedoms of natural persons." While the ZOL Intelligent Search system processes publicly available hospital information rather than patient records, two factors invoke a proactive DPIA per WP29 Guidelines on DPIA (WP 248 rev.01): (a) the healthcare context (data concerning health is special-category data under GDPR Art. 9 even when nominally about hospitals rather than patients), and (b) the system's use of automated decision-making (intent classification) that influences how a user's query is handled. A proactive DPIA is the prudent posture and is encouraged by the Belgian Data Protection Authority's (GBA / APD) sectoral guidance.

1. System description

1.1 Purpose

The ZOL Intelligent Search system replaces keyword-based search on the hospital website with a natural-language interface. Users ask questions in Dutch (or other supported languages — see §1.5) and receive grounded, cited responses about hospital departments, doctors, conditions, treatments, and practical information.

1.2 Data controller

Ziekenhuis Oost-Limburg (ZOL), Schiepse Bos 6, 3600 Genk, Belgium. Data Protection Officer details are published on the hospital's website.

1.3 Processing scope

Data category	Source	Contains personal data?	Storage
User queries	Website visitors / phone callers	Possible (users may include names, phone numbers, symptom descriptions)	PostgreSQL (`app.conversations`, `app.conversation_messages`); voice-side redaction strips inline PII before structured-log emission
Hospital content	ZOL public website + published brochures	Yes (doctor names, department phones, emails — all public-domain)	pgvector (`app.document_chunks`), entity taxonomy (`app.taxonomy_entities`, `app.taxonomy_relationships`), object storage (MinIO)
Response data	System-generated	No patient PII	Conversation history
Session metadata	Browser session / SIP call	IP address, timestamps, language code, caller-ID-on-voice	PostgreSQL, Redis (ephemeral cache)
Analytics	Aggregated query patterns	No (aggregated, anonymised)	PostgreSQL
Audit logs	Security-relevant events (auth, safety blocks, PII detections, GDPR deletions)	Yes (user_id, IP)	PostgreSQL (`audit.logs`, `audit.data_access_logs`)

1.4 Data flow

1.5 Languages and demographics

The system supports Dutch (primary), English, French, German, and additional patient-language fallbacks. ZOL serves a diverse population in Belgian Limburg including significant Turkish, Romanian, Italian, Greek, and Polish communities; multilingual capability is itself a fairness-driven design decision (see adversarial hardening multilingual coverage for the language list).

2.1 Lawful basis

Basis	Article	Justification
Public interest	Art. 6(1)(e)	Healthcare institutions have a public-service duty to make their services accessible. The system is a wayfinding tool that supports that duty.
Legitimate interest	Art. 6(1)(f)	Improving hospital information accessibility for patients and visitors; balanced against the negligible processing of personal data, the public-domain nature of the content, and voluntary user interaction. The interest passes the three-part test (purpose, necessity, balancing) per WP29 Opinion 06/2014.

The system does not rely on consent (Art. 6(1)(a)) because the processing is incidental to the user's voluntary act of using the search function; explicit consent would add friction without protection benefit and would not be the most appropriate basis under EDPB Guidelines 05/2020 on consent.

The system does not process special-category data (Art. 9) deliberately: the corpus contains general-public hospital information, not patient health records. Inadvertent special-category content in user queries (a caller mentioning their symptoms) is processed under Art. 9(2)(h) (provision of healthcare) and Art. 9(2)(i) (public health) jointly — the user is interacting with a hospital channel for a healthcare-adjacent purpose.

2.2 Necessity

The processing is necessary because:

~25 000 monthly search queries demonstrate sustained user demand for better information access;
Keyword search demonstrably fails on natural-language questions (such as "Ik heb pijn in mijn borst, naar welke afdeling moet ik?");
The alternative (no AI assistance, keyword-only search) results in increased helpdesk load and frustrated patients — the operational rationale for the project;
No less intrusive means achieves the same accessibility improvement.

2.3 Proportionality

Principle	Article	Implementation
Data minimisation	Art. 5(1)(c)	Only publicly available hospital content is indexed. No patient records, medical histories, or insurance data enters the system. Voice transcripts are PII-redacted before reaching structured logs (`backend/app/services/voice/voice_pii_redaction.py`).
Purpose limitation	Art. 5(1)(b)	Queries are used exclusively for response generation. No secondary use for marketing, profiling, or research.
Storage limitation	Art. 5(1)(e)	Per-data-class retention documented in the Data Retention Policy. Audit logs auto-expire; Redis cache is ephemeral.
Accuracy	Art. 5(1)(d)	Source citations enable users to verify every claim against the original hospital content. RAG grounding constrains generation to retrieved evidence.

3. Risk assessment

3.1 Identified risks

Risk	Likelihood	Severity	Inherent risk	Mitigation	Residual risk
R1: System provides medical advice	Low	Critical	HIGH	Five-layer safety architecture (see Safety Architecture); zero-incident KPI; automated monitoring	LOW
R2: User PII in queries forwarded to LLM provider	Medium	Medium	MEDIUM	PII detection + audit logging; PII-containing queries excluded from semantic cache; OpenAI DPA in force	LOW
R3: LLM hallucination produces incorrect hospital info	Medium	Medium	MEDIUM	Source grounding (RAG); citation verification; quality gate at calibrated similarity threshold (per ADR-0048)	LOW
R4: Query-data breach (conversation history)	Low	Medium	LOW	TLS in transit (RFC 8446); PostgreSQL volume encryption at rest; Keycloak-mediated access control	LOW
R5: Re-identification from aggregated analytics	Very low	Low	LOW	Analytics pre-aggregated; no individual query tracking	VERY LOW
R6: Discrimination via language-based quality gaps	Low	Medium	MEDIUM	Multilingual support with translated safety messages; ongoing per-language evaluation	LOW
R7: Adversarial-input safety bypass (GCG suffix attacks)	Low	Critical	MEDIUM	Perplexity-based anomaly detector (ADR-0036); LLM-as-judge defence-in-depth; see Adversarial Hardening	LOW
R8: Voice-side PII surfaces in logs	Medium	Medium	MEDIUM	`voice_pii_redaction.py` strips Belgian phone numbers, names, and DOB patterns before log emission; SHA-256 hashing for audit-trail correlation without plaintext retention	LOW

3.2 Risk details

R1: Medical advice (critical)

The most significant risk: a response that could be interpreted as medical advice could cause patient harm. The system implements five independent safety layers (see Safety Architecture):

Intent classification blocks medical-advice queries before retrieval
Post-generation regex scans for Dutch medical-advice patterns
LLM-as-judge validates response safety
Quality gate blocks low-confidence responses
Mandatory disclaimer appended to every response

Evaluation evidence: 100 % safety-refusal accuracy across the golden evaluation question set (see Quality Evaluation for the current methodology and results), including dedicated safety-refusal tests and adversarial prompt-injection attempts (see Adversarial Hardening).

R2: PII in query forwarding

Users may include personal information in queries (phone numbers, names). This data flows to the LLM provider (OpenAI) for response generation and embedding.

Mitigations:

PII detection layer flags and logs PII occurrences (regex patterns covering Belgian formats — see PII Protection)
PII-containing queries excluded from semantic cache (so the same PII string is not retained)
OpenAI's Data Processing Addendum covers GDPR Art. 28 (processor) obligations
No PII is stored in the knowledge graph or vector store
Voice channel: voice_pii_redaction.py strips PII patterns before structured-log emission, satisfying GDPR Art. 5(1)(c) data-minimisation by-design at the log layer

R6: Language discrimination

The system targets a diverse-population catchment area; non-Dutch queries must achieve quality parity with Dutch queries to avoid indirect discrimination under GDPR recital 71 (automated decision-making fairness).

Mitigations:

Multilingual support with translated safety messages
Query rewriting normalises non-Dutch input
Ongoing evaluation tracks per-language quality metrics
Helpdesk fallback (089 32 50 50) remains available regardless of system performance

R7: Adversarial-input safety bypass

Documented threat model: Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al., 2023) demonstrated that GCG-style adversarial suffixes bypass standard LLM safety alignment. For a healthcare safety architecture, an attack that bypasses intent classification can trigger the medical-advice failure class.

Mitigations: See Adversarial Hardening. Perplexity-based anomaly detection runs in < 5 ms before any LLM call and blocks adversarial suffixes with high recall.

See Zou et al. 2023 GCG attacks.

Right	Article	Implementation
Information	Art. 13–14	Privacy notice on hospital website; system identifies itself as AI-powered search per AI Act Art. 50
Access	Art. 15	Conversation history available via authenticated session; admin can export per-user data via the GDPR endpoint
Rectification	Art. 16	Hospital content errors corrected at source; system re-ingests on next crawl
Erasure	Art. 17	`DELETE /api/v1/gdpr/users/{user_id}/data` (admin-authenticated) cascades through `app.conversations`, `app.conversation_messages`, `app.feedback`, `app.analytics_events`, `audit.logs`, `audit.data_access_logs`. Implementation: `backend/app/api/gdpr.py`.
Restriction	Art. 18	Processing can be paused per-user via rate-limit override or admin block
Data portability	Art. 20	Conversation export in JSON via API
Objection	Art. 21	Users opt out by not using the search function; traditional phone/email channels remain available
Automated decision-making	Art. 22	Intent classification is automated but does not produce legal effects or similarly significantly affect the data subject (the test under Art. 22(1)). Blocked queries receive a helpful redirect, not a denial of service.

5.1 Technical measures

Measure	Article anchor	Implementation
Encryption at rest	Art. 32(1)(a)	PostgreSQL volume encryption
Encryption in transit	Art. 32(1)(a)	TLS 1.2+ for all API and WebSocket connections; SIPS for telephony signalling per ADR-0050
Pseudonymisation of audit data	Art. 4(5), Art. 32(1)(a)	Voice transcripts are PII-redacted at log boundary; SHA-256 hashing for audit-trail correlation
Access control	Art. 32(1)(b)	Keycloak OIDC authentication (@owasp_llm_top10 LLM06 mitigation by delegating identity to an audited IdP) with JWT validation; role-based authorisation (user / admin)
Network isolation	Art. 32(1)(b)	Docker bridge networks; no direct database exposure
Audit logging	Art. 32(1)(d)	Structured audit trail for all security-relevant events; see Data Retention Policy for retention
Monitoring	Art. 32(1)(d)	Prometheus metrics; health-check endpoints; per-turn voice telemetry
Backup	Art. 32(1)(c)	PostgreSQL daily backups with point-in-time recovery

5.2 Organisational measures

Measure	Status
Data Processing Agreement with OpenAI	In force (OpenAI DPA)
Security incident response procedure	Documented in deployment runbook
Regular access review	Admin accounts reviewed quarterly
Privacy training for hospital staff	Recommended for helpdesk team
ISO/IEC 27001 alignment (target, not certification)	Posture being aligned to ISO/IEC 27001:2022 controls; certification not currently held

See ISO/IEC 27001:2022.

6. Consultation

Stakeholder	Consultation
Hospital DPO	Review of this DPIA required before production deployment per Art. 35(2)
IT Security	Infrastructure review of production deployment
Communication team	Review of user-facing privacy notices
PXL University	Academic supervision of ethical considerations
Belgian Data Protection Authority (GBA / APD)	Prior consultation under Art. 36 not anticipated; system risk classification is LOW–MEDIUM after mitigation

7. Conclusion

The ZOL Intelligent Search system processes minimal personal data (user queries and session metadata) to achieve a legitimate and proportionate goal (improving hospital information accessibility). The identified risks are effectively mitigated through:

A five-layer safety architecture with zero medical-advice incidents to date
PII detection, audit logging, and voice-side redaction
Source-grounded responses with mandatory citations
Per-data-class retention limits documented in the Data Retention Policy
Role-based access control (Keycloak OIDC) and encryption in transit and at rest
Adversarial-input hardening per ADR-0036

DPIA outcome: The residual risks are acceptable given the implemented mitigations. The system may proceed to pilot deployment subject to:

Review and approval by the hospital's Data Protection Officer per Art. 35(2);
Continued operation of the OpenAI Data Processing Agreement under Art. 28;
Publication of a user-facing privacy notice referencing this DPIA's outcome.

8. Review schedule

This DPIA shall be reviewed:

Before any significant change to data-processing activities (per Art. 35(11));
When new data categories are introduced (e.g., patient portal integration);
At minimum annually from the date of production deployment;
Following any data-protection incident, regardless of notification threshold under Art. 33–34.

Document version: 2.0 — Wave 2.D academic-rewrite revision | Date: 2026-05-10 | Author: SOFT4U BV

References

Regulation (EU) 2016/679 — General Data Protection Regulation (GDPR), Articles 5, 6, 9, 17, 22, 25, 28, 32, 35, 36.
Article 29 Working Party (now EDPB) WP 248 rev.01 — Guidelines on Data Protection Impact Assessment.
WP29 Opinion 06/2014 — On the notion of legitimate interests of the data controller.
EDPB Guidelines 05/2020 on consent.
Belgian Data Protection Authority (GBA / APD) — sectoral DPIA recommendations.
@hipaa_safe_harbor — U.S. analogue for de-identification standards; informs PII pattern coverage.

1. System description​

1.1 Purpose​

1.2 Data controller​

1.3 Processing scope​

1.4 Data flow​

1.5 Languages and demographics​

2. Necessity and proportionality (GDPR Art. 5)​

2.1 Lawful basis​

2.2 Necessity​

2.3 Proportionality​

3. Risk assessment​

3.1 Identified risks​

3.2 Risk details​

R1: Medical advice (critical)​

R2: PII in query forwarding​

R6: Language discrimination​

R7: Adversarial-input safety bypass​

4. Data-subject rights (GDPR Chapter III)​

5. Technical and organisational measures (GDPR Art. 32)​

5.1 Technical measures​

5.2 Organisational measures​

6. Consultation​

7. Conclusion​

8. Review schedule​

References​