⚡Strata 4

Compute Engines

One task in your Spark stage runs for 40 minutes while the other 199 finished in seconds — textbook skew — and adding executors does nothing but raise the bill.

This stratum builds Spark from RDDs and DataFrames through the Catalyst optimizer, the shuffle, partitioning and skew, broadcast joins, and structured streaming. You learn to read a physical plan and the Spark UI and know exactly why a job is slow before you touch a single config.

What you'll learn

Read a physical plan and the Spark UI to find the real bottleneck
Tune shuffle partitions and fix data skew instead of throwing hardware at it
Know when a broadcast join helps and when it OOMs the driver
Reason about structured streaming: micro-batches, state, and exactly-once sinks

Tracks & courses

Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.

Apache Spark

From distributed fundamentals through advanced internals and real-time streaming.

Apache Spark: Fundamentals

RDDs, DataFrames, Spark SQL, joins, window functions, and production batch pipelines.

20 ch · 6h 40m

1 free

Apache Spark: Advanced Internals

DAG scheduler, shuffle mechanics, Tungsten, Catalyst, AQE, data skew, and Delta Lake.

20 ch · 6h 50m

1 free

Apache Spark: Streaming

DStreams, Structured Streaming, event time, watermarks, Kafka deep integration.

20 ch · 6h 40m

1 free

Start Compute Engines free

The first chapters of every course are free to read — no account needed.

Start: Apache Spark: Fundamentals →All strata

What you'll learn

Tracks & courses

Apache Spark

Related topics

Start Compute Engines free