Question 1

What is a back-of-the-envelope estimate in data engineering?

Accepted Answer

It's the quick capacity math that turns a vague requirement into concrete numbers: from events per second to events per day, bytes per day (raw and compressed), monthly storage, peak throughput, and the partition count those numbers force. In a system-design interview it's the move that grounds every later design decision.

Question 2

How do I convert events per second to GB per day?

Accepted Answer

Multiply events/sec by ~86,400 seconds in a day to get events/day, then multiply by the average event size to get raw bytes/day. Divide by your columnar compression ratio (roughly 3-10x) for the compressed size. This calculator does all of that live and shows each step.

Question 3

How many Kafka partitions do I need?

Accepted Answer

Take your peak ingest throughput in MB/s and divide by what one partition handles comfortably (~10 MB/s) to get the minimum. Then provision roughly 2x for consumer parallelism and growth, because partitions cap parallelism and can't be reduced cleanly later. The tool computes both numbers from your inputs.

Question 4

What compression ratio should I assume?

Accepted Answer

For typical JSON clickstream records in a columnar format (Parquet with SNAPPY or ZSTD), ~5x is a reasonable default; wider ranges of 3-10x are common depending on data shape and cardinality. Set it to 1x only if the data is genuinely uncompressed.

Question 5

Is this calculator free and private?

Accepted Answer

Yes — it's completely free with no sign-up, and it runs entirely as JavaScript in your browser. Nothing you type is uploaded; you can open DevTools → Network to confirm there are no server calls.

Question 6

Can I use this to prep for a data engineering system design interview?

Accepted Answer

Yes. In a design round you size each component out loud — pick the Kafka, storage, API, or Spark tab, plug in your assumptions, and the derived numbers plus design-forcing findings are exactly the reasoning interviewers score.

Back-of-the-Envelope Calculator

Inputs

The math

Design-forcing findings

Pipeline sizing — FAQ

Learn the framework behind the math