The Ghost in the Snapshot

P1medium10 minIncident Response

An external auditor re-ran a right-to-erasure verification for a user you deleted six weeks ago. Their email and home address came back. The live table is clean, the job logged SUCCESS, and the same pipeline has 'erased' 412 people this quarter.

Exposed PII rows
148,200 → 0 rows recoverable
PII files in storage
1,840 → 0 files

The incident

It's a Friday morning audit and the DPO just pinged you in a panic. Forty days ago you ran a GDPR right-to-erasure request for user_id 88213, sent the confirmation email, and closed the ticket. This morning the regulator's external auditor re-ran their own verification and recovered 88213's full email and home address - two different ways: a time-travel query against the customers table, and a partner's marketing extract. The strange part is that everything still looks done: SELECT on the live customers table returns zero rows for 88213, the erasure job logged SUCCESS, and the post-delete row count matched exactly what we expected. The deletion clearly happened somewhere - the row really is gone from the current table - but the person's data plainly is not. And a quick look at the last quarter shows 88213 isn't special: 412 erasure requests went through the exact same pipeline. Legal has to decide whether to self-report a failed erasure, and the audit re-runs Monday at 09:00.

Symptoms on the table

  • user_id 88213 was confirmed erased 40 days ago; the auditor just recovered their email and home address
  • the PII came back two ways: a time-travel query, and a partner marketing extract
  • the live customers table is clean - SELECT WHERE user_id=88213 returns zero rows
  • the erasure job logged status=SUCCESS and the post-delete row count matched expectations
  • no alert ever fired - the erasure pipeline has been green the whole time
  • a spot-check of the last 90 days shows ~412 subjects processed by the same job, all marked SUCCESS

Systems on the board

The real components in play for this incident — the surface you investigate when the clock starts.

DSAR Intake
erasure request service
Erasure Job
Spark delete on customers
Iceberg `customers`
current snapshot
Snapshot History
time-travel / retained snapshots
Catalog / Lineage
OpenMetadata
`marketing_silver`
CTAS copy + partner export

What you'll practice

This is a timed, hands-on incident in the Incident Response. You diagnose the symptom, trace it to a root cause across real components, and ship a fix before the clock runs out — the same loop you run on call, without the production blast radius.

Members-only challenge

Solve it in the Simulation Arcade.

The interactive workspace — live metrics, the component map, and the fix you ship — runs inside Petascale Labs. Sign in to start the clock.

Related topics

Browse the full Arcade

Every challenge maps to a stratum in the curriculum.