Module: Pipeline Quality vs Data Quality | Duration: ~13 min | Lesson: 1 of 6
Priya's manager messages on Slack: "the pipeline health dashboard is all green. why is finance saying yesterday's revenue is wrong?"
Priya opens the pipeline-health dashboard. All 47 DAGs are green. Last run times are within SLA. Task durations are normal. Cluster utilization is healthy. There is no signal anywhere on this dashboard that suggests anything is wrong.
She opens the warehouse and runs SELECT SUM(amount) FROM daily_revenue WHERE revenue_date = yesterday. The number is $312k. Finance's source-of-truth payment processor says it should be $1.2M. A $900k discrepancy in a table where every DAG that touches it is green.
The pipeline is healthy. The data is wrong. Both statements are true at the same time, and a dashboard that shows only the first one is half a dashboard.
2. Concept Explanation
Two independent axes
Pipeline quality and data quality are orthogonal. Picture a 2x2:
Most teams build monitoring for the bottom-left quadrant (pipeline red, data wrong). That's the easy case. Something failed loudly; on-call sees the red square; the data is fixed when the rerun succeeds.
The dangerous quadrant is the top-left. Pipeline green, data wrong. No alarm fires from the pipeline-health side because nothing about the pipeline is anomalous. The bug is in the data, and the only thing that catches it is a check against the data itself.
Pipeline-health signals
Pipeline-health signals come from the orchestrator's metadata:
- task status (succeeded/failed)
- task duration
- DAG run latency
- worker queue depth
- scheduler heartbeat
These tell you "is the work getting done on time?" They tell you nothing about whether the work produced the right output.
Data-quality signals
Data-quality signals come from the data itself:
- row counts vs expected baseline
SUM(amount)reconciliation against an external source of truth- null rates per column
- referential integrity (orphaned foreign keys)
- distributional checks (mean/median/p99 within a band)
- column-value range checks (no negative ages, no future birthdates)
These tell you "is the output of the work right?" They tell you nothing about how it got there.
Why teams conflate the two
The orchestrator owns both running the work and recording whether work ran. The metadata DB is the obvious place to put a dashboard. Data-quality checks live somewhere else (Soda, Great Expectations, dbt tests, ad-hoc SQL). Building one dashboard from the orchestrator's metadata is one config file. Building a second dashboard from data-quality results is integration work.
Most teams do the easy part and don't do the hard part. They get away with it for a while. Then someone notices a $900k discrepancy.
The "two dashboards, one wall" rule
A working data org has two dashboards (or two halves of one dashboard):
| Pipeline health | Data health |
|---|---|
| Tasks green/red | Tables fresh/stale |
| DAG durations | Row counts vs baseline |
| Worker queue depth | Critical metric reconciliations |
| Last successful run | Null/distinct/range checks per critical column |
On-call is paged when either side goes red. The two halves are kept separate so people can see what kind of problem is happening at a glance.
The mistake is showing only the left column. The next mistake is hiding the right column behind a "data quality" link nobody clicks.
3. Worked Example
Priya's $900k discrepancy traced back to a SQL bug in daily_revenue:
The >= '{{ ds }}' filter is open-ended; the date(created_at) = '{{ ds }}' filter narrows back to the day. The bug is that date(created_at) = '{{ ds }}' is computed in UTC, while created_at >= '{{ ds }}' interpreted {{ ds }} as midnight UTC. Timezone slippage drops 8 hours of payments.
Every task in the DAG was green. Nothing about the pipeline metadata hinted at the bug. The only signal that could have caught this was a data-quality check.
Here are three data-quality checks that would have caught it:
Any one of these, run after the main pipeline, would have paged Priya before the dashboard lied to finance. The first two are slow (full-day scans). The third is cheap and would have caught this exact bug (only 16 hours showed activity, not 24).
Aha: The orchestrator's job is to get the work done. It is not to judge whether the work was right. That judgment requires running a check against the data itself, after the work. A green pipeline is necessary for the data to be right. It is not sufficient.
4. Real-World Application
Every mature data team eventually ships a "Data Reliability" function (the name varies: Data Observability, Data Trust, Data SRE). Its purpose is exactly the right-hand column above: continuous checks against the data, not against the metadata.
Tooling has converged: Monte Carlo, Bigeye, Soda, Acceldata, and dbt's data-tests block all produce data-quality signals separate from pipeline signals. The most useful ones land in the same paging system as pipeline alerts, so the on-call sees a unified queue but knows from the alert which axis failed.
The teams that skip this layer don't realize they're skipping it. They have a pipeline dashboard that's been green for months and an annual ritual called "the consultant audit found a 7% revenue undercount in 2024." The bug was visible from day one; nothing was looking.
5. Your Turn
Exercise: TheWorldShop just hired you to "improve their data quality." The current state: one pipeline dashboard (all green), no data-quality checks, daily complaints from finance and ops about numbers being "off."
- Sketch a minimum-viable Data Health dashboard that pairs with the existing Pipeline Health dashboard. Pick exactly 3 critical tables and propose 1 check per table.
- For each check, name (a) the SQL shape, (b) the alert threshold, (c) the paging severity (P0/P1/P2 from Lesson 7).
- The CTO asks "won't this just be more alerts?" Defend the trade in 2-3 sentences.
6. Recap + Bridge
Pipeline health and data health are orthogonal axes. Most teams ship the first dashboard and don't ship the second. The cure is a separate set of checks running against the data, with their own alerts and their own paging severity. Next lesson we look at one of the easiest data-quality checks to ship (row counts) and the false sense of security it gives you when it's the only one.