Module: Conformed Dimensions & the Bus Matrix | Duration: ~12 min | Lesson: 1 of 6
Marketing builds dim_customer_marketing. Finance builds dim_customer_finance. Both teams swear theirs is the real customer dim.
Three months later: a cross-functional report asks "revenue per active marketing-eligible customer". The analyst joins finance's revenue facts to marketing's customer dim. The customer counts don't match the marketing dashboard. Email tickets fly. The two teams discover they each excluded different sub-populations (marketing excluded B2B; finance included refunded-only customers). Nobody knew because nobody wrote it down.
The bus matrix is the artifact that would have prevented this — before either team started building.
It is not a Jira board. It is not a Confluence page. It is a one-page spreadsheet that is the highest-leverage piece of documentation a data team produces.
2. Concept Explanation
What a bus matrix is
The bus matrix is a grid:
- Rows: business processes (and their fact tables) — "Order Placed", "Support Ticket Created", "Web Session", "Email Sent", "Payment Received".
- Columns: conformed dimensions —
dim_customer,dim_date,dim_product,dim_region,dim_employee. - Cells: ✓ if that dim applies to that process, blank otherwise.
| Process | dim_customer | dim_date | dim_product | dim_region | dim_employee |
|---|---|---|---|---|---|
| Order Placed | ✓ | ✓ | ✓ | ✓ | |
| Support Ticket | ✓ | ✓ | ✓ | ✓ | |
| Web Session | ✓ | ✓ | ✓ | ||
| Email Sent | ✓ | ✓ | |||
| Payment | ✓ | ✓ | ✓ |
That's the whole thing. One page. The simplicity is the point.
Why this matters
Three non-obvious uses:
- It surfaces conformance opportunities. Every column with multiple ✓s is a dim that must be conformed — the support team and the orders team both join to
dim_customer, so they need to agree on what "customer" means. - It surfaces missing facts. Drawing the matrix reveals processes the warehouse doesn't yet track ("why is there no row for
Subscription Renewal?"). Backlogs get cleaner because gaps become visible. - It serves as a treaty between teams. Once teams agree the customer dim is shared, the matrix is the contract — finance can't unilaterally fork it, marketing can't add a second one. New processes must declare their conformance commitments on the matrix.
Conformance is not a technical artifact — it's an organizational one
The critical insight: conformed dims fail for political reasons, not technical ones. Two teams writing their own customer dim isn't a tooling problem (any warehouse supports shared tables); it's a coordination problem (the teams don't trust each other's update cadence, or have different exclusion rules, or just didn't talk).
The bus matrix forces the conversation. Teams sitting down to fill it in have to agree on column definitions before they can fill in cells. That conversation is the conformance work.
The two ways a matrix lives
- As living documentation. A Google Sheet or wiki page, updated as new processes / dims are added. Reviewable, diffable, the canonical source of truth.
- As an exposure in code. dbt's
exposuresfeature lets you declare downstream consumers of models — the matrix can be partially reconstructed fromexposures+sources+refs. Useful but not a substitute for the human conversation.
Most mature teams have both: the spreadsheet is the treaty; the code-level metadata is the verification.
The matrix as a sequencing tool
When designing a new warehouse from scratch, the matrix tells you the build order:
- List all the processes (rows).
- List all the dims (columns).
- Sort dims by number of ✓s — the most-used dim is the most important to get right first.
- Build that dim, get cross-team agreement, then build the fact tables for the processes that reference it.
- Repeat for the next dim.
This prevents the common failure: building five fact tables in parallel, each with its own customer dim, and trying to conform them after the fact (expensive, painful, often impossible without rewriting history).
3. Worked Example
Draw the bus matrix for our lab's warehouse.
Processes in the lab:
fact_sale— order placed.fact_support_ticket— support interaction.fact_web_session— web visit.
Dimensions:
dim_customerdim_datedim_productdim_region(currently inlined in dim_customer; could be its own dim)
The matrix:
| Process | dim_customer | dim_date | dim_product | dim_region |
|---|---|---|---|---|
| Sale | ✓ | ✓ | ✓ | ✓* |
| Support Ticket | ✓ | ✓ | ✓* | |
| Web Session | ✓ | ✓ | ✓* |
(*via dim_customer, since region is currently inlined.)
Four observations:
dim_customeris used by all three processes — highest-priority conformance target. If we get this dim right, every cross-process analysis becomes possible.dim_dateis used by all three — also high priority. Time-zone consistency, fiscal calendar consistency, grain consistency.dim_productis only used by sales — local to that process; conformance isn't urgent.dim_regionis conformed through dim_customer — by inlining region on the customer dim (Course 1.1 Lesson 6 decision), we get region conformance for free across all three fact tables.
The matrix also exposes a missing process: we have web sessions but no fact_email_sent or fact_marketing_campaign_received. If marketing wanted to analyze "web sessions following an email campaign", they couldn't — the data isn't in the warehouse. The matrix made the gap visible.
Now look at the raw data and find the conformance problem:
Three different join keys: integer ID, email, web user ID. To conform dim_customer so all three facts can join to it, we need a customer dim that carries all three identifiers — or a separate bridge_customer_identity table mapping them. We'll build this in Lesson 2.
Aha: The bus matrix isn't documentation about what is — it's a treaty about what will be. Two teams cannot conform their dims if they don't first agree on the column. The matrix is the meeting agenda for that agreement. Skipping it means relitigating customer-definition disagreements every time a cross-functional report is requested.
4. Your Turn
Exercise: Draw a bus matrix.
For a company you've worked at (or imagine one — a B2C SaaS product), draw a bus matrix with at least 5 processes and 5 dims.
- List the processes (rows) and the conformed dims (columns).
- Identify the dim with the most ✓s — what would prioritizing its conformance unlock?
- Find one cell where a missing ✓ surprises you (the dim should apply to this process but currently doesn't because the warehouse doesn't capture it).
5. Real-World Application
The Kimball Group's consulting practice essentially boils down to two artifacts: the bus matrix and the conformed-dim implementation that follows from it. Every successful warehouse engagement they ran in 30 years started with this spreadsheet.
The reason it works as a tool isn't methodological — it's psychological. Two teams arguing about whose dim_customer is canonical cannot have that argument productively in the abstract. But when they're staring at a row that says "Order Placed → dim_customer" with both their names attached, the argument is concrete and resolvable. Either you converge on one dim or you mark the row as having two dims (the matrix supports this — you can write both dim names in the cell and color it red, marking it as a conformance-debt item to resolve).
The modern equivalent in many companies: the data product canvas (a slightly more detailed framing that adds SLAs, contracts, owners). Same idea, more columns. The bus matrix is the minimal version; the data product canvas is the version after enterprise consulting.
For a small team, a Google Sheet with 10 rows × 6 columns will do more for warehouse coherence than any tool. The discipline is to keep it updated — when a new fact table is added, it must show up on the matrix before code review. When a new dim is proposed, the team must check the matrix to see if it duplicates an existing column.
The matrix as living documentation also serves as the answer to the question "what's in this warehouse, briefly?" — for new hires, for stakeholders, for the engineer doing data discovery for an AI agent. One page replaces tens of pages of wiki sprawl.