Module: Designing OLAP Systems | Duration: 12 min read | Lesson: 1 of 12
Priya gets handed an architecture review. The doc proposes ClickHouse, with pre-aggregated rollups, a 30-day hot tier, and a nightly batch from Postgres. Her job is to say whether it's right. She knows the engines cold now. She knows the ops. But "is this design right?" is a different question from "how does ClickHouse work?", and she's never been taught how to answer it.
Here's the thing: with modeling theory settled and the engines internalized, designing an OLAP system isn't about mechanics anymore. It's about four forces, freshness, cost, blast radius, and team, and the tradeoffs between them. This course is those tradeoffs. This lesson frames what's actually being decided.
2. Concept Explanation
You've spent six tracks learning how engines work. Designing a system is a different skill: choosing which mechanics to apply given constraints that have no perfect answer. By this point, two big things are already settled, and naming them sharpens what's left.
What's NOT in scope: modeling theory
Star vs snowflake, OBT, SCD types, grain, conformed dimensions, that's dimensional modeling, and it lives in S7 (Semantic & Metrics Layer). This course assumes that vocabulary. We're not deciding "fact and dimension tables"; we're deciding, given a model, how to physically realize it on a specific engine. Modeling theory is a prerequisite, not the subject.
What's NOT in scope: engine mechanics
How MergeTree works, how Trino plans, how segments hand off, you learned all that in 6.2-6.6. This course doesn't re-teach mechanics; it uses them to make decisions.
What IS the subject: four forces in tension
Once modeling and mechanics are settled, every OLAP design decision is a balance of four forces:
- Freshness. How old can the data be? Seconds (real-time), minutes, hours (batch)? Freshness fights cost and simplicity.
- Cost. Storage, compute, and the human cost of operating it. Lower cost usually means less pre-computation or less isolation.
- Blast radius. When something breaks or a number is wrong, how much has to be rebuilt or how many users are affected? Pre-aggregation shrinks query cost but enlarges restatement blast radius.
- Team. What can the people actually operate? The best design your team can't run is worse than the adequate one they can.
Every decision in this course, pre-agg vs raw, granularity, MV strategy, federate vs materialize, single vs multi-engine, is trading these four against each other. There's rarely a dominant answer; there's a defensible point given the constraints.
Why "three valid designs" is the mindset
The capstone asks you to produce three valid architectures for one workload, not one "correct" one. That's the whole philosophy: a senior designer doesn't find the right answer, they map the tradeoff space, place a few defensible points on it, and name what each gives up. "It depends" isn't a cop-out here; it's the literal skill, knowing what it depends on.
Aha: "Designing an OLAP system" sounds like it should have right answers, and it doesn't, it has defensible ones. Once the data model and the engine mechanics are settled, what's left is trading freshness, cost, blast radius, and team against each other, and those forces genuinely conflict. The senior move isn't picking the winner; it's articulating the tradeoff so clearly that everyone can see why you placed the design where you did, and what it costs.
3. Worked Example
Re-read Priya's proposal (ClickHouse, pre-aggregated rollups, 30-day hot tier, nightly batch) through the four forces, that's how you review a design.
- Freshness: nightly batch means data is up to ~24h old. Is that the requirement? If the dashboard is "yesterday's sales," fine. If it's "live orders," this design is wrong on force one alone.
- Cost: pre-aggregated rollups + 30-day hot tier is cost-efficient (small queries, bounded hot storage). Good on cost.
- Blast radius: pre-aggregation means a corrected historical number requires rebuilding the rollup, not a single update. If restatements are common (finance corrections), this design has a painful blast radius. If data is append-only and rarely corrected, fine.
- Team: ClickHouse is one binary, operable by a small team. Good on team fit.
The review writes itself: this is a strong design if freshness tolerance is ~daily and restatements are rare, and a poor one if the workload is live or frequently corrected. Notice you didn't ask "does ClickHouse work?", you asked "do its tradeoffs match this workload's forces?". That's the design skill. The same proposal is right for one workload and wrong for another, and the four forces tell you which.
4. Your Turn
Exercise: Review designs by the four forces.
- A proposal pre-aggregates everything to the hour to cut cost. The product needs minute-level drill-down. Which force is violated?
- A design federates live queries against the production orders database for a per-minute dashboard. Which forces are at risk?
- A team of two is handed a six-process Druid + Pinot + Trino multi-engine design. Which force most threatens success?
- Why does this course ask for three valid designs per workload instead of one best design?
5. Real-World Application
This framing is what distinguishes an architecture review that adds value from one that rubber-stamps. A reviewer who only knows mechanics says "yes, ClickHouse can do that." A designer who thinks in the four forces says "this is right if freshness tolerance is daily and restatements are rare, here's what breaks otherwise." The latter is the conversation that prevents the expensive mistake, choosing a design whose tradeoffs silently don't match the workload.
It also sets the boundary with the rest of the curriculum. Modeling theory (S7) decides the logical shape; this course decides the physical realization on real engines; ops (6.6) decides whether you can run it. Keeping those separate is itself a senior skill, design reviews go sideways when people argue modeling purity in an engine-mechanics decision, or vice versa.
6. Recap + Bridge
Designing an OLAP system, with modeling theory (S7) and engine mechanics (6.2-6.6) assumed, is balancing four forces in tension: freshness, cost, blast radius, and team. There are defensible designs, not single-correct ones, which is why the capstone asks for three. Reviewing a design means checking whether its tradeoffs match the workload's forces, not whether the engine "works."
The first and most fundamental of these tradeoffs underlies almost every other decision. Next, and your first hands-on: Pre-Aggregation vs Raw, The Core Tradeoff.