← Back to all posts
#data-modeling#scd#dimensional-modeling#warehouse

Slowly Changing Dimensions, Actually Explained

By Petascale Labs ·

Slowly Changing Dimensions are one of those topics that everyone has "covered" and almost nobody has internalized. You can recite the list - Type 1 overwrites, Type 2 versions, Type 3 keeps a prior column, the way the Kimball Group's dimensional-modeling techniques lay them out - and still build a warehouse that confidently reports numbers that were never true.

The list isn't wrong. It's just answering the boring question. The interesting question - the one that decides whether your business intelligence is intelligence or fiction - is this:

When a customer moves from the West region to the East region, and you ask "what was revenue by region last quarter," which region do their old orders count toward?

Every SCD type is really an answer to that. So let's walk them as one evolving decision instead of a numbered list, and then point at a place you can watch the decision play out: the free SCD Playground, where you replay a change timeline and see exactly which version each fact lands on.

The setup: a dimension that won't sit still

Take a customer dimension. Today, customer C_1001 lives in the West region on the Pro plan. Three things are about to happen, in order:

  1. They upgrade Pro → Enterprise.
  2. They relocate West → East.
  3. Somewhere in there, an order is placed.

The whole of SCD is: what does the dimension remember about these changes, and what do the facts see when they join in? Hold that order of events in your head - it's the entire plot.

Type 0 and Type 1: forgetting, on purpose and by accident

Type 0 is "never changes." Some attributes genuinely shouldn't - original_signup_date, birth_country. Type 0 says: reject the update, this column is frozen. Useful, underused, and the right call more often than people think.

Type 1 is the default almost everyone reaches for: overwrite. The plan changes to Enterprise, you UPDATE the row, the old value is gone. The dimension always shows the current truth and nothing else.

And that's the trap. Type 1 isn't just simple - it's destructive, and the destruction is silent.

!Warning

The Type 1 silent lie. Your customer was on the Pro plan when they placed 50 orders last quarter. They've since upgraded to Enterprise, and Type 1 overwrote the plan. Now someone runs "revenue by plan, last quarter." The join finds C_1001 on Enterprise - its current value - and files all 50 historical orders under Enterprise. A plan they weren't on when those orders happened. The query is correct SQL. The number is fiction. And nothing errors, nothing warns; the dashboard just quietly reports a past that never existed.

This is the single most important thing to understand about SCD, and it's why the list-memorization approach fails people: Type 1 looks like the safe default and is actually the one most likely to corrupt your historical reporting.

Type 2: history as separate rows

Type 2 is the answer when history matters. Instead of overwriting, you close the old row and open a new one. The dimension stops being "one row per customer" and becomes "one row per version of a customer," each stamped with a validity window.

customer_idplanregionvalid_fromvalid_tois_current
C_1001ProWest2026-01-012026-04-30false
C_1001EnterpriseWest2026-05-012026-05-14false
C_1001EnterpriseEast2026-05-15-true

Now the join can be correct. A fact row carries the order's timestamp, and you join to the dimension version whose [valid_from, valid_to) window contains it. The 50 orders from last quarter find the Pro / West version, because that's who the customer was at the time. The history is preserved, and "revenue by plan last quarter" finally tells the truth.

This is the crux the playground is built around: a fact doesn't join to "the customer," it joins to "the customer as of the moment the fact occurred." Type 1 can't express that. Type 2 can.

Type 3, 4, and 6: the rest of the toolbox

Once you see Type 2 as "version the whole row," the others are variations on how much history and where you keep it:

  • Type 3 keeps a previous_region column alongside the current one. Exactly one step of history, in the same row. Cheap and occasionally perfect - "what region were they in before the latest move" - but it can't answer "three changes ago," and it doesn't help time-based joins.
  • Type 4 moves history out to a separate history table, keeping the main dimension lean and current while the full trail lives next door. Good when the current-row lookups are hot and history is queried rarely.
  • Type 6 is the combo (1 + 2 + 3 = 6, which is the actual mnemonic): Type 2 rows for full history, plus a current-value column on every version so you can ask both "what was true then" and "what's true now" without a second join.

None of these is "advanced." They're just different answers to how much forgetting is acceptable - the same question Type 0 and Type 1 answer at the extremes.

Where the senior-vs-junior line actually is

The list of types is junior knowledge. The senior skill is everything around the table:

  • As-of joins. Writing the temporal join correctly - half-open intervals so a single instant never matches two versions, handling the open-ended current row - is where real bugs live.
  • Anti-patterns. Type 2 on a high-churn attribute (a last_login_at that changes hourly) explodes row count and buys nothing. Type 1 on anything you'll ever report historically is the silent lie waiting to happen.
  • Row-growth projection. Type 2 trades storage for truth. Knowing how much before you ship it - versions per entity times entities - keeps a dimension from quietly becoming your largest table.
  • Bitemporal. The grown-up version, where you track both when something was true in the world and when you recorded it - so you can correct a mistake without losing the record that you once believed the mistake.
Tip

The fastest way to feel the difference between these is to stop reading and replay one. The SCD Playground lets you push a change timeline through Types 0–6 side by side, drop a fact in and watch which version it joins to under each type, time-travel the dimension as-of any date, and lint a config for the anti-patterns above. Each finding links to the lesson behind it. It runs entirely in your browser.

The one sentence to keep

If you remember nothing else: a fact joins to a dimension as it was at the moment the fact happened, not as it is now. Type 1 throws away the information you need to honor that, and does it silently. Every other type is a choice about how faithfully to preserve it.

Get that straight and the numbered list stops being trivia and becomes what it actually is - a menu of tradeoffs between storage, query simplicity, and historical truth. Pick deliberately. And if you want the full dimensional modeling treatment behind it, the playground's findings link straight into the curriculum; the fastest start, as always, is to point the SCD Playground at the change you're actually modeling and watch the facts land.

Found this useful? Give it a like.