The Metric Drift Problem

Module: Why a Metrics Layer Exists | Duration: ~13 min | Lesson: 1 of 7


Priya is the analytics lead at TheWorldShop. On Monday the growth team's dashboard says revenue grew 12% last quarter. On Tuesday the finance deck says it grew 9%. On Wednesday a board member pulls up a notebook a data scientist shared and gets 10.5%.

Nobody made a mistake. Every number is "correct" by the query that produced it. The growth dashboard counts refunds as negative revenue in the month of the refund. Finance counts them in the original sale month. The notebook excludes test accounts the other two include.

Priya spends Thursday in a room with three teams reconciling three numbers that all describe the same thing. By Friday they agree. Next quarter, it happens again.

She isn't bad at her job. The numbers drifted because the definition of "revenue" lives in three places, and nothing keeps them in sync.


2. Concept Explanation

What metric drift actually is

Metric drift is when the same named business quantity ("revenue", "active user", "conversion rate") is computed differently in different places, so the answers disagree. The disagreement is usually small enough to be plausible and large enough to matter.

The key word is named. Drift isn't two people computing two genuinely different things. It's two people computing what they both call "revenue" and getting different answers, because the rules baked into each computation differ in ways nobody wrote down.

Why drift is inevitable without a shared definition

A metric is not just a number. It's a number plus a pile of decisions:

  • What rows count. Do test accounts count? Internal employees? Soft-deleted orders?
  • What window. Refunds in the sale month or the refund month? Subscriptions recognized at signup or amortized?
  • What grain. Per order or per order line? Per user or per session?
  • What filters. Active customers only? A specific region? Excluding fraud-flagged rows?

Every place a metric is computed re-encodes those decisions. A Looker dashboard encodes them in LookML. A dbt model encodes them in SQL. A notebook encodes them in pandas. A spreadsheet encodes them in a pivot table some analyst built in 2023 and left the company.

Four encodings of the same intent means four chances to diverge. And they will diverge, because each one is maintained by a different person who patches it when their stakeholder complains, not when the other three change.

Drift is an organizational failure, not a SQL bug

This is the part that surprises engineers. You cannot fix drift by writing more careful SQL. The SQL in each place is fine. The problem is that there are four places.

Drift is what happens when the definition of a metric is duplicated across the tools that consume it. Duplication is the disease. The numbers disagreeing is just the symptom you happen to notice.

Once you see drift as a duplication problem, the fix becomes obvious in shape if not in detail: define the metric once, in one place, and make every consumer read that one definition instead of re-implementing it. That single place is the metrics layer. The rest of this course is about what that place is, why it's hard to build, and why most teams reach for it only after they've been burned.

Why drift costs more than it looks like

The reconciliation meeting Priya sat through is the visible cost. The invisible cost is larger:

  • Lost trust. Once an executive catches two dashboards disagreeing, they stop trusting all dashboards. They start asking an analyst to "just pull the real number", which means every decision now waits on a human.
  • Decision latency. The board meeting stalls while three teams argue about whose 10% is right.
  • Silent wrong decisions. The worst case isn't the meeting where you catch the drift. It's the quarter where nobody catches it and a budget gets set on the wrong number.

Aha: Metric drift is not a tooling problem you can out-engineer with better queries. It's the predictable result of defining the same metric in more than one place. The metrics layer doesn't make your SQL smarter. It removes the duplication that made drift possible, so there's only one definition left to disagree with.


3. Worked Example

Let's make drift concrete. Here are three "revenue" queries that three teams at TheWorldShop honestly believe compute the same thing.

Growth team (in the BI tool):

SELECT date_trunc('month', event_date) AS month,
       SUM(amount) AS revenue
FROM orders
GROUP BY 1;

Finance team (in a dbt model):

SELECT date_trunc('month', order_date) AS month,
       SUM(amount) - COALESCE(SUM(refund_amount), 0) AS revenue
FROM orders
LEFT JOIN refunds USING (order_id)
GROUP BY 1;

Data science (in a notebook):

df = orders[orders["account_type"] != "test"]
revenue = df.groupby(df["order_date"].dt.to_period("M"))["amount"].sum()

Three differences hide in plain sight:

  1. Refunds. Growth ignores them. Finance subtracts them. Data science ignores them.
  2. Test accounts. Only data science excludes them.
  3. Date column. Growth uses event_date (when the order was recorded); the others use order_date (when it was placed). Near a month boundary these differ.

None of these queries is wrong. Each encodes a defensible choice. But "revenue for last quarter" now has three values, and which one is the value is a decision nobody actually made. It emerged from three people writing three queries on three Tuesdays.

The metrics-layer version collapses this to one definition that all three teams query:

metric:
  name: revenue
  description: "Net revenue: gross sales minus refunds, excluding test accounts."
  expr: amount - coalesce(refund_amount, 0)
  agg: sum
  filter: "account_type != 'test'"
  time_dimension: order_date

Now growth, finance, and data science all ask for revenue and get the same number, because there's only one place the rules live. If finance wants to change how refunds are handled, they change it here, and all three consumers move together. That's the whole game.


4. Your Turn

Exercise: You're handed two dashboards that both show "active users" for TheWorldShop and they disagree by about 7%.

  1. List three concrete definitional differences that could produce a 7% gap between two honest "active users" numbers.
  2. For each, say where that difference is most likely encoded (BI tool, dbt model, notebook, etc.).
  3. Explain in one sentence why "audit both queries and pick the right one" does not actually fix the problem long-term.

5. Real-World Application

Every company past a certain size has lived Priya's Thursday. The pattern is so common it has a name in the industry: the "single source of truth" problem, except the truth was never single. Surveys of data teams routinely find "we don't trust our metrics" and "different teams report different numbers" near the top of the pain list, ahead of more glamorous problems like real-time pipelines.

The reason drift is so durable is that it grows with success. A two-person startup has one analyst and one definition of revenue, in her head. The drift problem is created by scale: more teams, more tools, more dashboards, each a new place to re-encode the same intent. By the time a company has a growth team, a finance team, and a data science team, it has at least three definitions of every important metric, and usually doesn't know it until a board meeting forces a reconciliation.

This is why the metrics layer became a category around 2020-2022 rather than in the early data-warehouse era. The warehouse made data joinable and fast. BI tools made it visible. But none of them made a metric mean the same thing twice across tools, and as the modern data stack multiplied the number of consuming tools, the drift it caused got bad enough that a dedicated layer to fix it became worth its considerable cost. The next lesson tells the canonical version of this story: three teams, three definitions of "monthly active users", and what it cost.


6. Recap + Bridge

Metric drift is the same named quantity computed differently in different places, and it's inevitable whenever a metric's definition is duplicated across the tools that consume it. It's an organizational duplication problem, not a SQL problem, and its real cost is lost trust and slow or wrong decisions, not re-run queries. The fix in shape is simple: define each metric once and make every consumer read that one definition.

Next, we'll walk through the canonical drift war story in detail, the "three definitions of MAU", to see exactly how reasonable people produce three different numbers and what it cost the business.