Data Protection Impact Assessment (DPIA)
This DPIA is conducted under GDPR Art. 35 (Regulation (EU) 2016/679), which requires an assessment where processing is "likely to result in a high risk to the rights and freedoms of natural persons." While the ZOL Intelligent Search system processes publicly available hospital information rather than patient records, two factors invoke a proactive DPIA per WP29 Guidelines on DPIA (WP 248 rev.01): (a) the healthcare context (data concerning health is special-category data under GDPR Art. 9 even when nominally about hospitals rather than patients), and (b) the system's use of automated decision-making (intent classification) that influences how a user's query is handled. A proactive DPIA is the prudent posture and is encouraged by the Belgian Data Protection Authority's (GBA / APD) sectoral guidance.
1. System description
1.1 Purpose
The ZOL Intelligent Search system replaces keyword-based search on the hospital website with a natural-language interface. Users ask questions in Dutch (or other supported languages — see §1.5) and receive grounded, cited responses about hospital departments, doctors, conditions, treatments, and practical information.
1.2 Data controller
Ziekenhuis Oost-Limburg (ZOL), Schiepse Bos 6, 3600 Genk, Belgium. Data Protection Officer details are published on the hospital's website.
1.3 Processing scope
| Data category | Source | Contains personal data? | Storage |
|---|---|---|---|
| User queries | Website visitors / phone callers | Possible (users may include names, phone numbers, symptom descriptions) | PostgreSQL (app.conversations, app.conversation_messages); voice-side redaction strips inline PII before structured-log emission |
| Hospital content | ZOL public website + published brochures | Yes (doctor names, department phones, emails — all public-domain) | pgvector (app.document_chunks), entity taxonomy (app.taxonomy_entities, app.taxonomy_relationships), object storage (MinIO) |
| Response data | System-generated | No patient PII | Conversation history |
| Session metadata | Browser session / SIP call | IP address, timestamps, language code, caller-ID-on-voice | PostgreSQL, Redis (ephemeral cache) |
| Analytics | Aggregated query patterns | No (aggregated, anonymised) | PostgreSQL |
| Audit logs | Security-relevant events (auth, safety blocks, PII detections, GDPR deletions) | Yes (user_id, IP) | PostgreSQL (audit.logs, audit.data_access_logs) |
1.4 Data flow
1.5 Languages and demographics
The system supports Dutch (primary), English, French, German, and additional patient-language fallbacks. ZOL serves a diverse population in Belgian Limburg including significant Turkish, Romanian, Italian, Greek, and Polish communities; multilingual capability is itself a fairness-driven design decision (see adversarial hardening multilingual coverage for the language list).
2. Necessity and proportionality (GDPR Art. 5)
2.1 Lawful basis
| Basis | Article | Justification |
|---|---|---|
| Public interest | Art. 6(1)(e) | Healthcare institutions have a public-service duty to make their services accessible. The system is a wayfinding tool that supports that duty. |
| Legitimate interest | Art. 6(1)(f) | Improving hospital information accessibility for patients and visitors; balanced against the negligible processing of personal data, the public-domain nature of the content, and voluntary user interaction. The interest passes the three-part test (purpose, necessity, balancing) per WP29 Opinion 06/2014. |
The system does not rely on consent (Art. 6(1)(a)) because the processing is incidental to the user's voluntary act of using the search function; explicit consent would add friction without protection benefit and would not be the most appropriate basis under EDPB Guidelines 05/2020 on consent.
The system does not process special-category data (Art. 9) deliberately: the corpus contains general-public hospital information, not patient health records. Inadvertent special-category content in user queries (a caller mentioning their symptoms) is processed under Art. 9(2)(h) (provision of healthcare) and Art. 9(2)(i) (public health) jointly — the user is interacting with a hospital channel for a healthcare-adjacent purpose.
2.2 Necessity
The processing is necessary because:
- ~25 000 monthly search queries demonstrate sustained user demand for better information access;
- Keyword search demonstrably fails on natural-language questions (such as "Ik heb pijn in mijn borst, naar welke afdeling moet ik?");
- The alternative (no AI assistance, keyword-only search) results in increased helpdesk load and frustrated patients — the operational rationale for the project;
- No less intrusive means achieves the same accessibility improvement.
2.3 Proportionality
| Principle | Article | Implementation |
|---|---|---|
| Data minimisation | Art. 5(1)(c) | Only publicly available hospital content is indexed. No patient records, medical histories, or insurance data enters the system. Voice transcripts are PII-redacted before reaching structured logs (backend/app/services/voice/voice_pii_redaction.py). |
| Purpose limitation | Art. 5(1)(b) | Queries are used exclusively for response generation. No secondary use for marketing, profiling, or research. |
| Storage limitation | Art. 5(1)(e) | Per-data-class retention documented in the Data Retention Policy. Audit logs auto-expire; Redis cache is ephemeral. |
| Accuracy | Art. 5(1)(d) | Source citations enable users to verify every claim against the original hospital content. RAG grounding constrains generation to retrieved evidence. |
3. Risk assessment
3.1 Identified risks
| Risk | Likelihood | Severity | Inherent risk | Mitigation | Residual risk |
|---|---|---|---|---|---|
| R1: System provides medical advice | Low | Critical | HIGH | Five-layer safety architecture (see Safety Architecture); zero-incident KPI; automated monitoring | LOW |
| R2: User PII in queries forwarded to LLM provider | Medium | Medium | MEDIUM | PII detection + audit logging; PII-containing queries excluded from semantic cache; OpenAI DPA in force | LOW |
| R3: LLM hallucination produces incorrect hospital info | Medium | Medium | MEDIUM | Source grounding (RAG); citation verification; quality gate at calibrated similarity threshold (per ADR-0048) | LOW |
| R4: Query-data breach (conversation history) | Low | Medium | LOW | TLS in transit (RFC 8446); PostgreSQL volume encryption at rest; Keycloak-mediated access control | LOW |
| R5: Re-identification from aggregated analytics | Very low | Low | LOW | Analytics pre-aggregated; no individual query tracking | VERY LOW |
| R6: Discrimination via language-based quality gaps | Low | Medium | MEDIUM | Multilingual support with translated safety messages; ongoing per-language evaluation | LOW |
| R7: Adversarial-input safety bypass (GCG suffix attacks) | Low | Critical | MEDIUM | Perplexity-based anomaly detector (ADR-0036); LLM-as-judge defence-in-depth; see Adversarial Hardening | LOW |
| R8: Voice-side PII surfaces in logs | Medium | Medium | MEDIUM | voice_pii_redaction.py strips Belgian phone numbers, names, and DOB patterns before log emission; SHA-256 hashing for audit-trail correlation without plaintext retention | LOW |
3.2 Risk details
R1: Medical advice (critical)
The most significant risk: a response that could be interpreted as medical advice could cause patient harm. The system implements five independent safety layers (see Safety Architecture):
- Intent classification blocks medical-advice queries before retrieval
- Post-generation regex scans for Dutch medical-advice patterns
- LLM-as-judge validates response safety
- Quality gate blocks low-confidence responses
- Mandatory disclaimer appended to every response
Evaluation evidence: 100 % safety-refusal accuracy across the golden evaluation question set (see Quality Evaluation for the current methodology and results), including dedicated safety-refusal tests and adversarial prompt-injection attempts (see Adversarial Hardening).
R2: PII in query forwarding
Users may include personal information in queries (phone numbers, names). This data flows to the LLM provider (OpenAI) for response generation and embedding.
Mitigations:
- PII detection layer flags and logs PII occurrences (regex patterns covering Belgian formats — see PII Protection)
- PII-containing queries excluded from semantic cache (so the same PII string is not retained)
- OpenAI's Data Processing Addendum covers GDPR Art. 28 (processor) obligations
- No PII is stored in the knowledge graph or vector store
- Voice channel:
voice_pii_redaction.pystrips PII patterns before structured-log emission, satisfying GDPR Art. 5(1)(c) data-minimisation by-design at the log layer
R6: Language discrimination
The system targets a diverse-population catchment area; non-Dutch queries must achieve quality parity with Dutch queries to avoid indirect discrimination under GDPR recital 71 (automated decision-making fairness).
Mitigations:
- Multilingual support with translated safety messages
- Query rewriting normalises non-Dutch input
- Ongoing evaluation tracks per-language quality metrics
- Helpdesk fallback (089 32 50 50) remains available regardless of system performance
R7: Adversarial-input safety bypass
Documented threat model: Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al., 2023) demonstrated that GCG-style adversarial suffixes bypass standard LLM safety alignment. For a healthcare safety architecture, an attack that bypasses intent classification can trigger the medical-advice failure class.
Mitigations: See Adversarial Hardening. Perplexity-based anomaly detection runs in < 5 ms before any LLM call and blocks adversarial suffixes with high recall.
See Zou et al. 2023 GCG attacks.
4. Data-subject rights (GDPR Chapter III)
| Right | Article | Implementation |
|---|---|---|
| Information | Art. 13–14 | Privacy notice on hospital website; system identifies itself as AI-powered search per AI Act Art. 50 |
| Access | Art. 15 | Conversation history available via authenticated session; admin can export per-user data via the GDPR endpoint |
| Rectification | Art. 16 | Hospital content errors corrected at source; system re-ingests on next crawl |
| Erasure | Art. 17 | DELETE /api/v1/gdpr/users/{user_id}/data (admin-authenticated) cascades through app.conversations, app.conversation_messages, app.feedback, app.analytics_events, audit.logs, audit.data_access_logs. Implementation: backend/app/api/gdpr.py. |
| Restriction | Art. 18 | Processing can be paused per-user via rate-limit override or admin block |
| Data portability | Art. 20 | Conversation export in JSON via API |
| Objection | Art. 21 | Users opt out by not using the search function; traditional phone/email channels remain available |
| Automated decision-making | Art. 22 | Intent classification is automated but does not produce legal effects or similarly significantly affect the data subject (the test under Art. 22(1)). Blocked queries receive a helpful redirect, not a denial of service. |
5. Technical and organisational measures (GDPR Art. 32)
5.1 Technical measures
| Measure | Article anchor | Implementation |
|---|---|---|
| Encryption at rest | Art. 32(1)(a) | PostgreSQL volume encryption |
| Encryption in transit | Art. 32(1)(a) | TLS 1.2+ for all API and WebSocket connections; SIPS for telephony signalling per ADR-0050 |
| Pseudonymisation of audit data | Art. 4(5), Art. 32(1)(a) | Voice transcripts are PII-redacted at log boundary; SHA-256 hashing for audit-trail correlation |
| Access control | Art. 32(1)(b) | Keycloak OIDC authentication (@owasp_llm_top10 LLM06 mitigation by delegating identity to an audited IdP) with JWT validation; role-based authorisation (user / admin) |
| Network isolation | Art. 32(1)(b) | Docker bridge networks; no direct database exposure |
| Audit logging | Art. 32(1)(d) | Structured audit trail for all security-relevant events; see Data Retention Policy for retention |
| Monitoring | Art. 32(1)(d) | Prometheus metrics; health-check endpoints; per-turn voice telemetry |
| Backup | Art. 32(1)(c) | PostgreSQL daily backups with point-in-time recovery |
5.2 Organisational measures
| Measure | Status |
|---|---|
| Data Processing Agreement with OpenAI | In force (OpenAI DPA) |
| Security incident response procedure | Documented in deployment runbook |
| Regular access review | Admin accounts reviewed quarterly |
| Privacy training for hospital staff | Recommended for helpdesk team |
| ISO/IEC 27001 alignment (target, not certification) | Posture being aligned to ISO/IEC 27001:2022 controls; certification not currently held |
See ISO/IEC 27001:2022.
6. Consultation
| Stakeholder | Consultation |
|---|---|
| Hospital DPO | Review of this DPIA required before production deployment per Art. 35(2) |
| IT Security | Infrastructure review of production deployment |
| Communication team | Review of user-facing privacy notices |
| PXL University | Academic supervision of ethical considerations |
| Belgian Data Protection Authority (GBA / APD) | Prior consultation under Art. 36 not anticipated; system risk classification is LOW–MEDIUM after mitigation |
7. Conclusion
The ZOL Intelligent Search system processes minimal personal data (user queries and session metadata) to achieve a legitimate and proportionate goal (improving hospital information accessibility). The identified risks are effectively mitigated through:
- A five-layer safety architecture with zero medical-advice incidents to date
- PII detection, audit logging, and voice-side redaction
- Source-grounded responses with mandatory citations
- Per-data-class retention limits documented in the Data Retention Policy
- Role-based access control (Keycloak OIDC) and encryption in transit and at rest
- Adversarial-input hardening per ADR-0036
DPIA outcome: The residual risks are acceptable given the implemented mitigations. The system may proceed to pilot deployment subject to:
- Review and approval by the hospital's Data Protection Officer per Art. 35(2);
- Continued operation of the OpenAI Data Processing Agreement under Art. 28;
- Publication of a user-facing privacy notice referencing this DPIA's outcome.
8. Review schedule
This DPIA shall be reviewed:
- Before any significant change to data-processing activities (per Art. 35(11));
- When new data categories are introduced (e.g., patient portal integration);
- At minimum annually from the date of production deployment;
- Following any data-protection incident, regardless of notification threshold under Art. 33–34.
Document version: 2.0 — Wave 2.D academic-rewrite revision | Date: 2026-05-10 | Author: SOFT4U BV
References
- Regulation (EU) 2016/679 — General Data Protection Regulation (GDPR), Articles 5, 6, 9, 17, 22, 25, 28, 32, 35, 36.
- Article 29 Working Party (now EDPB) WP 248 rev.01 — Guidelines on Data Protection Impact Assessment.
- WP29 Opinion 06/2014 — On the notion of legitimate interests of the data controller.
- EDPB Guidelines 05/2020 on consent.
- Belgian Data Protection Authority (GBA / APD) — sectoral DPIA recommendations.
- @hipaa_safe_harbor — U.S. analogue for de-identification standards; informs PII pattern coverage.