Module: Real-Time OLAP | Duration: 12 min read | Lesson: 1 of 15
TheWorldShop ships a new feature: every seller gets a live dashboard of their own sales, updating as orders happen. "Revenue in the last 5 minutes," "top products right now," refreshed every few seconds, for 40,000 sellers at once. The product manager assumes it's the same as the internal analytics dashboard the data team already runs on ClickHouse.
It isn't, and the on-call engineer finds out the hard way. The internal dashboard has ten analysts looking at hour-old data. This new one has tens of thousands of end users hammering it, expecting seconds-old data, with a p99 that has to stay under half a second or the page feels broken. Same SQL shape, completely different machine underneath. This lesson is about why that difference exists, and why it gets its own engines.
2. Concept Explanation
"Real-time OLAP" sounds like "OLAP but faster." It's actually a different workload along three axes, and conflating it with analyst dashboards breaks both.
1. User-facing, not analyst-facing
An internal dashboard serves a handful of analysts. A user-facing analytics product serves every customer: the seller dashboard, the ads-campaign console, the fraud console, the game-stats screen. Concurrency jumps from tens of queries to thousands per second. The engine has to schedule and isolate that load, not just run one big query fast.
2. Sub-second p99 on fresh data
Analysts tolerate a 5-second query. End users staring at a UI do not. The target is p99 under a few hundred milliseconds, and it's p99, not average, because the slowest 1% is what users complain about. "Fast on average" is not the spec; "rarely slow" is.
3. Fresh data: streaming ingest, not nightly batch
The data must be seconds old, not hours. That means streaming ingestion (from Kafka or similar) running at the same time as the queries, on the same cluster. The engine has to absorb a firehose and serve sub-second reads simultaneously, which is a genuinely hard systems problem.
Why this needs different engines
ClickHouse and Trino can do analytics beautifully, but real-time OLAP at user-facing concurrency pushed the design somewhere specific:
- Pre-aggregate aggressively at ingest so queries are lookups, not scans.
- Immutable time-partitioned segments so fresh and historical data live in the same query path without locking.
- Role-based clusters that scale ingest, query, and storage independently, because they fail and saturate independently.
That's the shape of Druid and Pinot. They look unusual next to the engines you've met because they're built for the corner where concurrency, latency, and freshness are all extreme at once.
Aha: "Real-time" in real-time OLAP isn't mainly about query speed, plenty of engines are fast. It's about serving fresh data to many users concurrently, all three at once. Drop any one and you don't need Druid or Pinot. A fast query on stale data is just OLAP; a fast query on fresh data for ten users is a materialized view; it's the tens-of-thousands-of-users-on-seconds-old-data combination that reshapes the whole architecture.
3. Worked Example
Two dashboards at TheWorldShop, same SQL, different worlds.
Internal analyst dashboard.
- Query:
SELECT product, sum(revenue) FROM orders WHERE day >= today()-30 GROUP BY product ORDER BY 2 DESC LIMIT 20 - Users: ~15 analysts.
- Concurrency: a few queries a minute.
- Freshness: hourly batch is fine.
- p99 target: a few seconds is acceptable.
- Verdict: ClickHouse, Trino, even a warehouse, all fine.
Seller-facing live dashboard.
- Query: nearly the same, scoped to one seller, last 5 minutes to last 30 days.
- Users: 40,000 sellers, many online at once.
- Concurrency: thousands of queries per second at peak.
- Freshness: an order placed 10 seconds ago must show.
- p99 target: under 300ms or the UI feels broken.
- Verdict: this is the Druid/Pinot corner. Streaming ingest + pre-aggregation + segment storage + role-based scaling.
The SQL barely changed. The non-functional requirements changed completely, and those requirements, not the query, decide the engine.
4. Your Turn
Exercise: For each, decide if it's the real-time OLAP corner (Druid/Pinot territory) or ordinary OLAP (ClickHouse/Trino/warehouse), and name the deciding axis.
- 12 internal analysts exploring last quarter's sales, ad hoc, freshness irrelevant.
- A public "trending now" widget on a news site, updating every few seconds, millions of readers.
- A fraud console where analysts need transactions from the last 10 seconds, p99 under 200ms, hundreds of concurrent investigators.
- A nightly executive report emailed at 6am.
- An ads dashboard for 200,000 advertisers, each viewing their own campaign metrics, data no more than 30 seconds old.
5. Real-World Application
The companies you'd recognize for real-time OLAP are the ones with user-facing analytics at scale. Druid powers Netflix's and Confluent's operational dashboards and many "metrics for our customers" products. Pinot was built at LinkedIn for "Who viewed your profile" and powers Uber Eats restaurant analytics and many in-app metrics surfaces. The common thread is always the same three axes: end users, sub-second, fresh.
The practical lesson for your own architecture decisions: don't reach for Druid or Pinot because they're fast. Reach for them when you genuinely have the three-axis combination. If you only have one or two, a simpler engine (ClickHouse, a materialized view, a cache) is less to operate. These clusters are powerful and they are not cheap to run, as the ops lessons will make clear.
6. Recap + Bridge
Real-time OLAP is defined by three axes at once: user-facing concurrency (thousands of QPS), sub-second p99, and streaming freshness (seconds-old data). That combination, not the query shape, is what justifies Druid and Pinot and reshapes the architecture toward pre-aggregation, immutable segments, and role-based clusters.
Next we open the first of those clusters: Druid Architecture, where brokers, historicals, middle managers, and coordinators each do one job, and you bring up the lab that runs both engines.