Module: Framing | Duration: 20 min read | Lesson: 1 of 9

TheWorldShop's architecture review has stalled for three weeks on one question: "Kafka Streams, Flink, or Spark Structured Streaming?" Every engineer has a favorite, and the debate has become tribal, the Flink advocate cites Alibaba's scale, the Kafka Streams advocate cites operational simplicity, the Spark advocate cites "we already run Spark." Nobody's wrong, and nobody's converging.

The reason the debate won't resolve is that they're arguing about tools when the decision is about requirements. "Which engine is best?" has no answer. "Which engine best fits our latency target, state size, team, and operational budget?" has a clear one, once you write those down. The tribal debate is a symptom of skipping that step.

This course is a decision framework, not a tool tutorial (you already know the tools from the previous courses). It opens by reframing the question: you're not choosing a stream processor, you're choosing how to satisfy a specific set of constraints, and this lesson is how to surface those constraints before anyone names a tool.

2. Concept Explanation

Why "Which Is Best?" Is the Wrong Question

There is no universally best stream processor, only best fit for a workload and a team. The three engines made different tradeoffs, and a tradeoff that's perfect for one workload is wrong for another:

Kafka Streams optimized for operational simplicity (a library, no cluster) and Kafka-native processing.
Flink optimized for streaming power and correctness (true streaming, rich event time, huge state) at the cost of operating a cluster.
Spark Structured Streaming optimized for unifying batch and streaming on an engine teams already run, via micro-batch.

Asking "which is best" ignores that these aren't better/worse, they're different points on the same tradeoff surface. The decision is locating your workload on that surface.

The Dimensions That Actually Decide It

A real decision weighs these (each is a later lesson):

Latency (L2): do you need milliseconds, seconds, or are minutes fine? Hard real-time rules out micro-batch.
State (L3): kilobytes, gigabytes, or terabytes? Huge state favors Flink's RocksDB + incremental checkpoints.
Delivery guarantees (L4): at-least-once or exactly-once end-to-end? How critical is correctness?
Operational cost & team fit (L5): do you have a platform team? Are you on the JVM? Do you already run Spark/Kafka?
Connectors & ecosystem: Kafka-only, or many heterogeneous sources/sinks?
Batch+stream unification: one engine for both, or separate?

The winning move is to score your workload on these dimensions first, then see which engine's tradeoffs match, rather than starting from a tool you like.

Requirements Are Often Softer Than Claimed

A crucial discipline: interrogate the requirements before designing for them. Teams routinely over-state needs:

"We need sub-second latency", do you? Or is 5 seconds genuinely fine and you said "real-time" reflexively? (Real sub-second needs are rarer than claimed, and they're expensive.)
"We need exactly-once", everywhere, or only on the financial sink while the analytics path tolerates duplicates?
"We have huge state", measured, or assumed?

Half of stream-processor decisions are made easier by discovering a requirement was softer than stated. The senior move is to push back on "real-time" and "exactly-once" until they're quantified, because each one, if real, narrows the field and raises the cost.

Team Fit Is a Real Engineering Constraint

Engineers under-weight this, but it's often decisive: a team that runs a JVM service and has no platform team will succeed with Kafka Streams and struggle to operate a Flink cluster, regardless of Flink's technical superiority. A team already running Spark for batch can add Structured Streaming with little new operational surface. The best engine you can't operate is worse than the adequate engine you can. Team capability and existing investment are constraints, not excuses.

The Output of This Course: A Defensible Choice

By the end you'll produce, for a given workload, a scored decision: here are our latency/state/guarantee/ops/team constraints, here's how each engine fares against them, here's the choice and the tradeoff we're accepting. That artifact ends the tribal debate, because it's about requirements everyone can see, not preferences.

Aha: "Which stream processor is best?" is unanswerable and that's why your architecture review is stuck, the team is debating tools when the decision lives in requirements they haven't written down. There's no best engine, only best fit, and fit is determined by latency, state, guarantees, ops budget, and team, measured, not asserted. Half the decision is just interrogating "we need real-time / exactly-once" until it's a number. Write the constraints first and the engine usually picks itself.

3. Worked Example

Turn TheWorldShop's stalled debate into a scored decision.

Bring up the lab. This course uses a comparison harness that runs the same workload on all three engines so you can measure (not assert) the tradeoffs. Clone the lab repo (shared) and start it:

git clone https://github.com/petascalelabs/petascalelabs-lab-setup.git
cd petascalelabs-lab-setup/ingestion-and-transport/stream-processing/choosing-a-stream-processor/
./scripts/setup.sh

Verify Kafka and all three engines (Kafka Streams app, Flink cluster, Spark) are reachable:

./scripts/verify.sh
# expected: "Kafka ready, Flink 1.18 on :8081, Spark 3.5 ready, Kafka Streams harness ready, theworldshop.orders seeded"

You are helping me run the lab for the "Stream Processing Architecture"
course, which compares Kafka Streams, Flink, and Spark Structured Streaming
on the same workload. The lab is in
petascalelabs-lab-setup/ingestion-and-transport/stream-processing/choosing-a-stream-processor/
and includes:
  - docker-compose.yml: Kafka, a Flink cluster, Spark, and a Kafka Streams harness
  - the same benchmark workload runnable on each engine
  - scripts/setup.sh, scripts/verify.sh, scripts/teardown.sh, scripts/bench.sh

My environment:
  OS: <fill in>
  RAM: <fill in GB>  (running three engines wants several GB)

Walk me through:
1. Confirming Docker has enough memory for all three engines.
2. Bringing everything up and confirming each engine is healthy.
3. Running the benchmark on each engine and reading the comparison output.
4. The teardown command.

Do not assume my OS; ask if unclear.

Step 1, write down the constraints (before naming a tool):

./scripts/requirements-worksheet.sh

TheWorldShop fraud pipeline requirements:
  latency target:        alert within ~5s        (interrogated: NOT sub-second)
  state size:            ~tens of GB (millions of customers)
  delivery guarantee:    exactly-once on alerts  (financial action)
  team:                  JVM, small, NO platform team
  existing investment:   already run Kafka; do NOT run Flink or Spark today
  connectors:            Kafka in, Kafka out (+ a CDC'd flag table)

Step 2, see the requirement-softening in action:

./scripts/interrogate.sh latency
# "claimed: real-time. measured business need: a human/queue acts within seconds.
#  => 5s is fine; sub-second not required (would raise cost and narrow options)."

Step 3, preview the scoring (filled in over the course):

./scripts/score-preview.sh
# latency 5s:        all three viable (no sub-second pressure)
# state tens of GB:  Flink strong; Kafka Streams ok; Spark ok
# exactly-once:      all three capable
# team/ops:          Kafka Streams strong (no cluster); Flink costly (no platform team)
# => early signal: Kafka Streams fits the TEAM constraint that the debate ignored

The debate ignored the decisive constraint (no platform team), which the scoring surfaces immediately.

4. Your Turn

Exercise: TheWorldShop's team is stuck debating engines for a new pipeline. Before choosing, you must frame the decision properly.

List the five-or-six requirement dimensions you'd score before naming any engine.
The team says "we need real-time and exactly-once." Write the two interrogating questions you'd ask to test whether those are as hard as claimed.
Explain why "which engine is best?" is the wrong question and what the right one is.
The strongest engineer wants Flink for its technical superiority, but the team has no platform team and doesn't run Flink today. Argue why team fit can override technical superiority.
Describe the artifact this decision should produce, and why it ends a tribal debate.

5. Real-World Application

The stalled "which engine?" debate is extremely common, and it nearly always resolves the moment someone forces the requirements onto paper. Architecture reviews that start from tools loop; ones that start from quantified constraints converge.

Requirement-softening saves enormous cost. "We need sub-second / exactly-once everywhere" is frequently aspirational; discovering that 5 seconds and selective exactly-once suffice often opens up simpler, cheaper options (and de-escalates the debate). Senior architects interrogate these claims as a reflex.

Team fit is the most under-weighted real constraint. Countless teams adopted a technically superior engine they couldn't operate and paid for it in incidents. "Can we actually run this?" belongs in the decision alongside the benchmark numbers.

6. Recap + Bridge

You're not choosing a stream processor, you're choosing how to satisfy a set of constraints, and there's no best engine, only best fit. Score the workload on latency, state, delivery guarantees, operational cost, team fit, and connectors before naming a tool, interrogate "real-time" and "exactly-once" until they're quantified, and weight team capability and existing investment as real constraints. The output is a defensible, requirement-grounded choice.

The first dimension is the one teams argue about most and measure least. Next: latency, what each engine can and can't promise, and why micro-batch has a floor true streaming doesn't.