Skip to main content

Data Retention Policy

This document formalises the data-retention periods and lifecycle management for all data processed by the ZOL Intelligent Search system. Retention is the operational expression of GDPR Art. 5(1)(e) storage limitation: personal data shall be kept "in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed".

The policy is calibrated by data class — there is no single "retention period" that fits all data. Audit logs need a different retention than ephemeral session caches; analytics need a different retention than personal-data-touching feedback events. Each row in the schedule below is anchored to the specific GDPR article that supplies its lawful basis.

Retention schedule

Data categoryStorageRetention periodDeletion methodLawful basis (GDPR)
User conversationsPostgreSQL (app.conversations, app.conversation_messages)Session-based; available while user is authenticatedSoft delete via API; hard delete on user-account removal via DELETE /api/v1/gdpr/users/{user_id}/dataArt. 6(1)(f) legitimate interest; Art. 17 right to erasure on request
Audit logsPostgreSQL (audit.logs, audit.data_access_logs)90 daysAutomated expiry (scheduled task)Art. 6(1)(c) legal obligation (security monitoring); Art. 32
PII detection eventsPostgreSQL (within audit logs)90 daysExpires with parent audit logArt. 6(1)(f); Art. 32
Semantic cachePostgreSQL (app.semantic_cache)Indefinite (performance optimisation; no PII per design)Manual purge via admin API; auto-flush after embedding-model migrationArt. 6(1)(f); Art. 5(1)(c) (PII excluded)
Rate-limiting dataRedisEphemeral (1-minute to 24-hour TTL)Automatic Redis key expiryArt. 6(1)(f); Art. 5(1)(e)
Session tokensKeycloak (server-side sessions)Managed by Keycloak session policy (configurable idle and absolute timeouts)Keycloak session invalidation on logout; revocation endpointArt. 6(1)(f); Art. 32
Analytics eventsPostgreSQL (app.analytics_events)1 year (aggregated, no individual tracking)Automated expiryArt. 6(1)(f)
Hospital contentpgvector (app.document_chunks), taxonomy tables, MinIOIndefinite (refreshed on re-crawl)Replaced on content updateArt. 6(1)(e) public interest (public-domain content)
Voice transcripts (redacted)PostgreSQL via structured logsPer audit-log retention (90 days); redacted via voice_pii_redaction before writeExpires with audit logArt. 5(1)(c); Art. 32(1)(a) pseudonymisation
Voice call audioNOT STOREDNot retainedn/aArt. 5(1)(c) data minimisation — only transcripts are retained, audio is discarded post-STT
Evaluation resultsFile system (JSON)Indefinite (development artifact, no PII)Manual deletionNot personal data

Key principles

Data minimisation (Art. 5(1)(c))

  • No patient medical records are processed or stored
  • No health-insurance data enters the system
  • Semantic cache excludes PII: queries flagged by the PII detector are never cached
  • Analytics are pre-aggregated: individual query text is not stored in analytics events
  • Voice audio is not retained: only the (redacted) transcript reaches structured logs

Purpose limitation (Art. 5(1)(b))

DataPermitted useProhibited use
ConversationsGenerating search responses; follow-up context within a sessionMarketing, profiling, research without consent (Art. 6(1)(a))
Audit logsSecurity monitoring; compliance reporting (Art. 30); incident investigationPerformance reviews; user-behaviour analysis
AnalyticsSystem improvement; content-gap identification; aggregate reportingIndividual user tracking

Storage limitation (Art. 5(1)(e))

All data with defined retention periods is automatically managed:

Retention-period rationale (per category)

CategoryWhy the chosen periodWhy not longerWhy not shorter
Audit logs (90 days)Standard incident-response window; covers seasonal-pattern analysisRetaining longer increases breach-impact surface (audit logs themselves contain user_id and IP)Shorter would lose the compliance-investigation window after a delayed report
Analytics events (1 year)Year-over-year trend analysis (seasonal demand, language drift)No personal data after pre-aggregation, but retention longer than purpose requires would violate Art. 5(1)(e)Less than a year would cut off seasonal comparison, the primary analytics use case
Conversations (session-based)Conversation context across a single session is the primary use; persistence beyond session is opt-in via accountCross-session retention without explicit basis would exceed legitimate-interest balancingSession shorter than the user's task would force users to repeat queries — UX failure with no privacy benefit
Voice audio (not retained)The transcript is sufficient for product purposes; audio adds biometric-data risk under Art. 9Even short retention of audio creates an Art. 9 special-category-data surfacen/a

Data-subject requests (GDPR Chapter III)

When a data subject exercises their rights:

Request typeArticleProcessTimeline
AccessArt. 15Export conversation history via authenticated session or admin APIWithin 30 days (Art. 12(3))
ErasureArt. 17DELETE /api/v1/gdpr/users/{user_id}/data — admin-authenticated; cascades through app.conversations, app.conversation_messages, app.feedback, app.analytics_events, audit.logs, audit.data_access_logsWithin 30 days
RestrictionArt. 18Disable user account; retain data in restricted state per Art. 18(2)Within 72 hours of request
RectificationArt. 16Corrections to hospital content corrected at source; system re-ingests on next crawlWithin 30 days of source-data update
PortabilityArt. 20Conversation export in JSON format via APIWithin 30 days

The GDPR deletion endpoint (DELETE /api/v1/gdpr/users/{user_id}/data) requires admin authentication and returns a structured summary of deleted records across all data categories, providing an audit trail for compliance documentation:

# Cascaded deletion (backend/app/api/gdpr.py)
counts: dict[str, int] = {
"conversation_messages": ...,
"conversations": ...,
"feedback": ...,
"analytics_events": ...,
"logs": ..., # audit.logs
"data_access_logs": ..., # audit.data_access_logs
}

Documents uploaded by the user are NOT deleted (they belong to the tenant, not the individual user). This is the correct behaviour under GDPR — the document data is processed under the tenant's lawful basis, not the user's, and erasure of an individual user does not transfer to data the tenant retains under separate basis.

Third-party data sharing (GDPR Art. 28 processor relationships)

RecipientData sharedPurposeLawful basisSafeguard
OpenAI (LLM provider)Query text, retrieved-context chunks, embedding inputsResponse generation; embeddingsArt. 6(1)(f); Art. 28 (processor)OpenAI DPA in force; data not retained beyond API request lifecycle per their data-processing terms
Twilio (PSTN provider, voice channel)Caller phone number; SIP signalling metadataVoice channel termination per ADR-0050 (master)Art. 6(1)(f); Art. 28Twilio DPA + Standard Contractual Clauses for non-EEA data flows
No other third parties--------

Query text sent to OpenAI for LLM processing is not stored by the provider beyond the API request lifecycle, as specified in their data-processing terms. This is verified at API contract level rather than self-reported, and is the load-bearing basis for the residual-risk classification of R2 in the DPIA.

ISO/IEC 27001 alignment (target, not certification)

The retention policy is structured to be auditable against the relevant ISO/IEC 27001:2022 controls — A.5.34 (privacy and protection of PII), A.8.10 (information deletion), A.8.11 (data masking). The hospital does not currently hold ISO/IEC 27001 certification; the policy is a target alignment, not a certification claim.

See ISO/IEC 27001:2022. See ISO/IEC 27018:2019.

Review

This policy is reviewed:

  • Annually from the date of production deployment;
  • When new data categories are introduced (the most recent change was the addition of voice-transcript redaction in 2026-05);
  • When retention periods are modified (any reduction is auto-approved; any extension requires DPO sign-off);
  • When regulatory requirements change (GDPR amendments, AI Act enforcement actions, sectoral guidance updates);
  • Following any data-protection incident, regardless of the Art. 33–34 notification threshold.

Document version: 2.0 — Wave 2.D academic-rewrite revision | Date: 2026-05-10 | Author: SOFT4U BV

References