The Silver Shortfall

P2hard10 minIncident Response

Thursday, 09:48. A finance analyst reconciling the daily GMV report against the payment processor finds Gold sitting $1.29M below settled revenue. The weekly exec revenue review starts at 10:00.

Missing GMV
1,290,000 → 0 $
Silver completeness
94.2 → 100 %

The incident

It's Thursday, 09:48, and the weekly exec revenue review starts at 10:00. A finance analyst doing the morning reconciliation just flagged it: the Gold `gmv_daily` model reports $20.9M quarter-to-date, but the payment processor's settled total is $22.2M — Gold is $1.29M light, and a row-count check shows about 5.8% of orders (≈23,900) are simply absent from the Silver table they should have landed in. The shortfall isn't spread evenly; it's lopsided, concentrated almost entirely on SKUs from last week's product launch. Nothing looks broken: Bronze matches the processor to the cent, every Spark job is green, the orchestrator shows successes, and the Silver freshness check has been passing every single run. The rows exist in Bronze and never make it to Gold — and the exec review is in 12 minutes.

Symptoms on the table

  • Gold `gmv_daily` QTD: $20.9M · payment processor settled: $22.2M (Δ −$1.29M understated)
  • Silver row count is ~23,900 (5.8%) below Bronze for the same window — Bronze matches the processor exactly
  • The missing rows are almost entirely last week's launch SKUs; older SKUs reconcile cleanly
  • Silver freshness check: PASS on every run · table last updated 14 min ago · all green
  • Airflow: dim_refresh and silver_build both report `success`, no failures, no retries
  • Zero alerts and zero pages in the days the gap has existed

Systems on the board

The real components in play for this incident — the surface you investigate when the clock starts.

Bronze `orders`
raw landing · complete
Product CDC
feeds dim_product
`dim_product`
Iceberg · nightly overwrite
Silver build
Spark join · Bronze × dim
Airflow
orchestrator
Silver DQ check
freshness monitor
Gold `gmv_daily`
aggregate · exec report

What you'll practice

This is a timed, hands-on incident in the Incident Response. You diagnose the symptom, trace it to a root cause across real components, and ship a fix before the clock runs out — the same loop you run on call, without the production blast radius.

Members-only challenge

Solve it in the Simulation Arcade.

The interactive workspace — live metrics, the component map, and the fix you ship — runs inside Petascale Labs. Sign in to start the clock.

Related topics

Browse the full Arcade

Every challenge maps to a stratum in the curriculum.