Compute Engines
One task in your Spark stage runs for 40 minutes while the other 199 finished in seconds — textbook skew — and adding executors does nothing but raise the bill.
This stratum builds Spark from RDDs and DataFrames through the Catalyst optimizer, the shuffle, partitioning and skew, broadcast joins, and structured streaming. You learn to read a physical plan and the Spark UI and know exactly why a job is slow before you touch a single config.
What you'll learn
- Read a physical plan and the Spark UI to find the real bottleneck
- Tune shuffle partitions and fix data skew instead of throwing hardware at it
- Know when a broadcast join helps and when it OOMs the driver
- Reason about structured streaming: micro-batches, state, and exactly-once sinks
Tracks & courses
Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.
Apache Spark
From distributed fundamentals through advanced internals and real-time streaming.
Apache Spark: Fundamentals
RDDs, DataFrames, Spark SQL, joins, window functions, and production batch pipelines.
20 ch · 6h 40m
1 freeApache Spark: Advanced Internals
DAG scheduler, shuffle mechanics, Tungsten, Catalyst, AQE, data skew, and Delta Lake.
20 ch · 6h 50m
1 freeApache Spark: Streaming
DStreams, Structured Streaming, event time, watermarks, Kafka deep integration.
20 ch · 6h 40m
1 freeRelated topics
Start Compute Engines free
The first chapters of every course are free to read — no account needed.