Back-of-the-Envelope Calculator

Pick the component you're sizing — Kafka, storage, an API, or Spark — and turn rough inputs into the numbers that drive the design: throughput, partitions, storage footprint, concurrency, read splits. Each comes with design-forcing findings linked to the lessons that explain them.

Stays on your device — 100% in-browser, nothing uploaded
Component

Size a Kafka topic: throughput to partition count, consumer parallelism, and retention storage.

Inputs

The math

Peak events/sec
50.0K
5x avg
Avg throughput
9.8 MB/s
peak 49 MB/s
Min partitions
5
to absorb peak
Recommended partitions
10
2x peak, ≥ consumers
Events / day
864M
at avg rate
Retention storage
2.41 TB
24h x 3 replicas

Design-forcing findings

🔴 1 · 🟡 1 · 🟢 1
Peak throughput needs a partitioned topic

Peak ingest is 49 MB/s, above the ~10 MB/s one partition handles.

Why: A single partition can't absorb a spike this large — producers back up and lag.

Do: Spread across at least 5 partitions; provision ~10 for headroom.

Learn the internals → Kafka fundamentals
Consumer parallelism fits

~10 partitions for 4 consumers.

Why: Each consumer gets at least one partition, so the group scales.

Do: Keep partitions ≥ consumers as you scale the group.

Learn the internals → Kafka operations
Watch for hot-key partition skew

Partition assignment is only as even as your partition key.

Why: One hot key (a whale seller, a noisy device) melts one partition while the rest idle.

Do: Partition on a high-cardinality, evenly distributed key; salt or split known hot keys.

Learn the internals → Kafka operations

Pipeline sizing — FAQ

What is a back-of-the-envelope estimate in data engineering?
It's the quick capacity math that turns a vague requirement into concrete numbers: from events per second to events per day, bytes per day (raw and compressed), monthly storage, peak throughput, and the partition count those numbers force. In a system-design interview it's the move that grounds every later design decision.
How do I convert events per second to GB per day?
Multiply events/sec by ~86,400 seconds in a day to get events/day, then multiply by the average event size to get raw bytes/day. Divide by your columnar compression ratio (roughly 3-10x) for the compressed size. This calculator does all of that live and shows each step.
How many Kafka partitions do I need?
Take your peak ingest throughput in MB/s and divide by what one partition handles comfortably (~10 MB/s) to get the minimum. Then provision roughly 2x for consumer parallelism and growth, because partitions cap parallelism and can't be reduced cleanly later. The tool computes both numbers from your inputs.
What compression ratio should I assume?
For typical JSON clickstream records in a columnar format (Parquet with SNAPPY or ZSTD), ~5x is a reasonable default; wider ranges of 3-10x are common depending on data shape and cardinality. Set it to 1x only if the data is genuinely uncompressed.
Is this calculator free and private?
Yes — it's completely free with no sign-up, and it runs entirely as JavaScript in your browser. Nothing you type is uploaded; you can open DevTools → Network to confirm there are no server calls.
Can I use this to prep for a data engineering system design interview?
Yes. In a design round you size each component out loud — pick the Kafka, storage, API, or Spark tab, plug in your assumptions, and the derived numbers plus design-forcing findings are exactly the reasoning interviewers score.

Learn the framework behind the math

This calculator runs the 'Size it' step of the data engineering system design interview. The full framework — clarify, size, sketch, deep-dive, defend — is in the blog and the curriculum.