Data Retention Policy
This document formalises the data-retention periods and lifecycle management for all data processed by the ZOL Intelligent Search system. Retention is the operational expression of GDPR Art. 5(1)(e) storage limitation: personal data shall be kept "in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed".
The policy is calibrated by data class — there is no single "retention period" that fits all data. Audit logs need a different retention than ephemeral session caches; analytics need a different retention than personal-data-touching feedback events. Each row in the schedule below is anchored to the specific GDPR article that supplies its lawful basis.
Retention schedule
| Data category | Storage | Retention period | Deletion method | Lawful basis (GDPR) |
|---|---|---|---|---|
| User conversations | PostgreSQL (app.conversations, app.conversation_messages) | Session-based; available while user is authenticated | Soft delete via API; hard delete on user-account removal via DELETE /api/v1/gdpr/users/{user_id}/data | Art. 6(1)(f) legitimate interest; Art. 17 right to erasure on request |
| Audit logs | PostgreSQL (audit.logs, audit.data_access_logs) | 90 days | Automated expiry (scheduled task) | Art. 6(1)(c) legal obligation (security monitoring); Art. 32 |
| PII detection events | PostgreSQL (within audit logs) | 90 days | Expires with parent audit log | Art. 6(1)(f); Art. 32 |
| Semantic cache | PostgreSQL (app.semantic_cache) | Indefinite (performance optimisation; no PII per design) | Manual purge via admin API; auto-flush after embedding-model migration | Art. 6(1)(f); Art. 5(1)(c) (PII excluded) |
| Rate-limiting data | Redis | Ephemeral (1-minute to 24-hour TTL) | Automatic Redis key expiry | Art. 6(1)(f); Art. 5(1)(e) |
| Session tokens | Keycloak (server-side sessions) | Managed by Keycloak session policy (configurable idle and absolute timeouts) | Keycloak session invalidation on logout; revocation endpoint | Art. 6(1)(f); Art. 32 |
| Analytics events | PostgreSQL (app.analytics_events) | 1 year (aggregated, no individual tracking) | Automated expiry | Art. 6(1)(f) |
| Hospital content | pgvector (app.document_chunks), taxonomy tables, MinIO | Indefinite (refreshed on re-crawl) | Replaced on content update | Art. 6(1)(e) public interest (public-domain content) |
| Voice transcripts (redacted) | PostgreSQL via structured logs | Per audit-log retention (90 days); redacted via voice_pii_redaction before write | Expires with audit log | Art. 5(1)(c); Art. 32(1)(a) pseudonymisation |
| Voice call audio | NOT STORED | Not retained | n/a | Art. 5(1)(c) data minimisation — only transcripts are retained, audio is discarded post-STT |
| Evaluation results | File system (JSON) | Indefinite (development artifact, no PII) | Manual deletion | Not personal data |
Key principles
Data minimisation (Art. 5(1)(c))
- No patient medical records are processed or stored
- No health-insurance data enters the system
- Semantic cache excludes PII: queries flagged by the PII detector are never cached
- Analytics are pre-aggregated: individual query text is not stored in analytics events
- Voice audio is not retained: only the (redacted) transcript reaches structured logs
Purpose limitation (Art. 5(1)(b))
| Data | Permitted use | Prohibited use |
|---|---|---|
| Conversations | Generating search responses; follow-up context within a session | Marketing, profiling, research without consent (Art. 6(1)(a)) |
| Audit logs | Security monitoring; compliance reporting (Art. 30); incident investigation | Performance reviews; user-behaviour analysis |
| Analytics | System improvement; content-gap identification; aggregate reporting | Individual user tracking |
Storage limitation (Art. 5(1)(e))
All data with defined retention periods is automatically managed:
Retention-period rationale (per category)
| Category | Why the chosen period | Why not longer | Why not shorter |
|---|---|---|---|
| Audit logs (90 days) | Standard incident-response window; covers seasonal-pattern analysis | Retaining longer increases breach-impact surface (audit logs themselves contain user_id and IP) | Shorter would lose the compliance-investigation window after a delayed report |
| Analytics events (1 year) | Year-over-year trend analysis (seasonal demand, language drift) | No personal data after pre-aggregation, but retention longer than purpose requires would violate Art. 5(1)(e) | Less than a year would cut off seasonal comparison, the primary analytics use case |
| Conversations (session-based) | Conversation context across a single session is the primary use; persistence beyond session is opt-in via account | Cross-session retention without explicit basis would exceed legitimate-interest balancing | Session shorter than the user's task would force users to repeat queries — UX failure with no privacy benefit |
| Voice audio (not retained) | The transcript is sufficient for product purposes; audio adds biometric-data risk under Art. 9 | Even short retention of audio creates an Art. 9 special-category-data surface | n/a |
Data-subject requests (GDPR Chapter III)
When a data subject exercises their rights:
| Request type | Article | Process | Timeline |
|---|---|---|---|
| Access | Art. 15 | Export conversation history via authenticated session or admin API | Within 30 days (Art. 12(3)) |
| Erasure | Art. 17 | DELETE /api/v1/gdpr/users/{user_id}/data — admin-authenticated; cascades through app.conversations, app.conversation_messages, app.feedback, app.analytics_events, audit.logs, audit.data_access_logs | Within 30 days |
| Restriction | Art. 18 | Disable user account; retain data in restricted state per Art. 18(2) | Within 72 hours of request |
| Rectification | Art. 16 | Corrections to hospital content corrected at source; system re-ingests on next crawl | Within 30 days of source-data update |
| Portability | Art. 20 | Conversation export in JSON format via API | Within 30 days |
The GDPR deletion endpoint (DELETE /api/v1/gdpr/users/{user_id}/data) requires admin authentication and returns a structured summary of deleted records across all data categories, providing an audit trail for compliance documentation:
# Cascaded deletion (backend/app/api/gdpr.py)
counts: dict[str, int] = {
"conversation_messages": ...,
"conversations": ...,
"feedback": ...,
"analytics_events": ...,
"logs": ..., # audit.logs
"data_access_logs": ..., # audit.data_access_logs
}
Documents uploaded by the user are NOT deleted (they belong to the tenant, not the individual user). This is the correct behaviour under GDPR — the document data is processed under the tenant's lawful basis, not the user's, and erasure of an individual user does not transfer to data the tenant retains under separate basis.
Third-party data sharing (GDPR Art. 28 processor relationships)
| Recipient | Data shared | Purpose | Lawful basis | Safeguard |
|---|---|---|---|---|
| OpenAI (LLM provider) | Query text, retrieved-context chunks, embedding inputs | Response generation; embeddings | Art. 6(1)(f); Art. 28 (processor) | OpenAI DPA in force; data not retained beyond API request lifecycle per their data-processing terms |
| Twilio (PSTN provider, voice channel) | Caller phone number; SIP signalling metadata | Voice channel termination per ADR-0050 (master) | Art. 6(1)(f); Art. 28 | Twilio DPA + Standard Contractual Clauses for non-EEA data flows |
| No other third parties | -- | -- | -- | -- |
Query text sent to OpenAI for LLM processing is not stored by the provider beyond the API request lifecycle, as specified in their data-processing terms. This is verified at API contract level rather than self-reported, and is the load-bearing basis for the residual-risk classification of R2 in the DPIA.
ISO/IEC 27001 alignment (target, not certification)
The retention policy is structured to be auditable against the relevant ISO/IEC 27001:2022 controls — A.5.34 (privacy and protection of PII), A.8.10 (information deletion), A.8.11 (data masking). The hospital does not currently hold ISO/IEC 27001 certification; the policy is a target alignment, not a certification claim.
See ISO/IEC 27001:2022. See ISO/IEC 27018:2019.
Review
This policy is reviewed:
- Annually from the date of production deployment;
- When new data categories are introduced (the most recent change was the addition of voice-transcript redaction in 2026-05);
- When retention periods are modified (any reduction is auto-approved; any extension requires DPO sign-off);
- When regulatory requirements change (GDPR amendments, AI Act enforcement actions, sectoral guidance updates);
- Following any data-protection incident, regardless of the Art. 33–34 notification threshold.
Document version: 2.0 — Wave 2.D academic-rewrite revision | Date: 2026-05-10 | Author: SOFT4U BV
References
- Regulation (EU) 2016/679 — General Data Protection Regulation, Articles 5, 6, 12, 15–21, 28, 30, 32.
- Data Protection Impact Assessment — risk assessment that this policy implements.
- PII Protection — detection and redaction strategy.
- OpenAI Data Processing Addendum — https://openai.com/policies/data-processing-addendum/.