Data retention and deletion

Data is a liability as much as an asset. Do not keep it forever: define how long each category lives, then automate its expiry or anonymization and propagate every deletion to all derived copies. This is engineering guidance, not legal advice — confirm concrete retention periods with counsel.

Retention schedule

Storage limitation (GDPR Art. 5(1)(e), as of the 2016 regulation) requires that data be kept no longer than necessary for its purpose.

  • Maintain a retention schedule as code or config that maps each data category (user profile, auth tokens, audit logs, analytics events, PII, derived ML features) to a maximum retention period and a disposition (delete vs. anonymize).
  • Every category MUST have a defined retention period; absence of a period is itself a decision and MUST be justified (e.g., legal-hold or financial records with a statutory minimum).
  • Each category SHOULD have automated expiry — a scheduled job, TTL index, or partition-drop — rather than manual cleanup.
  • Record a created_at (and where relevant expires_at) timestamp on every retained record so expiry is computable and auditable.

Cascading deletion

A deletion that misses a copy is not a deletion.

  • Deletions SHOULD cascade from the source of truth to every derived store: denormalized tables, read replicas, caches (Redis/CDN), search indexes (Elasticsearch/OpenSearch), data-warehouse/analytics copies, message-queue payloads, and object storage (S3/blob).
  • Maintain an explicit inventory of where each category is copied; treat the inventory as the cascade checklist and keep it in version control.
  • Prefer event-driven cascade (emit a deletion-requested event; each store subscribes) over a monolithic delete that must know every downstream — this keeps stores decoupled and the design open to new sinks (optimize for change).
  • Make cascade steps idempotent and retryable; a partially failed cascade MUST be detectable and resumable, not silently abandoned.

Soft vs. hard delete

Choice When to use Caution
Soft delete (tombstone flag) Undo windows, referential integrity, short-lived audit needs Data still present — MUST NOT count as erasure for a privacy request
Hard delete (row removed) Erasure requests, PII past retention Irreversible; verify cascade first
Anonymize / pseudonymize Keep aggregates/analytics without identifying a person MUST be irreversible (no re-identification key retained)
  • Decide soft vs. hard per category, not globally. Erasure obligations MUST resolve to hard delete or true anonymization within the source and all derived stores.

Backups and erasure requests

  • Backups and immutable logs SHOULD be excluded from immediate cascade; instead document the lag — deleted data persists until the backup rotates out of its retention window.
  • Define and publish a backup retention/purge policy so the maximum lag between an erasure request and full physical removal is bounded and known.
  • For an erasure request, suppress the data from active systems immediately and rely on backup rotation for residual copies; MUST NOT restore deleted records from an old backup without re-applying pending deletions.

Auditing

  • Log every deletion (who/what/when, category, request reference) to a tamper-evident, separately retained audit trail — the log of a deletion is not the deleted data.
  • Reconcile periodically: scan for records past their expires_at that were not purged, and alert. Treat a reconciliation miss as a defect (fail fast).
version
1.0.0
tags
data, retention, privacy
author
Mike Fullerton
modified
2026-06-09

Change History

Version Date Author Summary
1.0.0 2026-06-09 Mike Fullerton Initial creation