Strata 4

Compute Engines

One task in your Spark stage runs for 40 minutes while the other 199 finished in seconds — textbook skew — and adding executors does nothing but raise the bill.

This stratum builds Spark from RDDs and DataFrames through the Catalyst optimizer, the shuffle, partitioning and skew, broadcast joins, and structured streaming. You learn to read a physical plan and the Spark UI and know exactly why a job is slow before you touch a single config.

What you'll learn

  • Read a physical plan and the Spark UI to find the real bottleneck
  • Tune shuffle partitions and fix data skew instead of throwing hardware at it
  • Know when a broadcast join helps and when it OOMs the driver
  • Reason about structured streaming: micro-batches, state, and exactly-once sinks

Tracks & courses

Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.

Related topics

Start Compute Engines free

The first chapters of every course are free to read — no account needed.