Apache Spark: Advanced Internals
DAG scheduler, shuffle mechanics, Tungsten, Catalyst, AQE, data skew, and Delta Lake.
Go deep on the DAG scheduler, shuffle mechanics, Tungsten, Catalyst optimizer, AQE, data skew, Kubernetes, GraphX, MLlib, and Delta Lake patterns.
Course content
- 01The DAG Scheduler InternalsFree
- 02Shuffle Deep Dive🔒
- 03Memory Management: The Unified Memory Model🔒
- 04Tungsten: Off-Heap Binary Format and Cache-Aware Computation🔒
- 05Whole-Stage Code Generation🔒
- 06Catalyst Optimizer Deep Dive🔒
- 07Adaptive Query Execution (AQE)🔒
- 08Join Strategies: Internals🔒
- 09Data Skew: Detection and Fixes🔒
- 10Partitioning and Bucketing🔒
- 11Kryo Serialization and GC Tuning🔒
- 12Data Locality and Speculative Execution🔒
- 13Advanced Configuration Mastery🔒
- 14Spark on YARN🔒
- 15Spark on Kubernetes🔒
- 16GraphX🔒
- 17MLlib Pipelines🔒
- 18MLlib Algorithms Deep Dive🔒
- 19Delta Lake and Lakehouse Patterns🔒
- 20Capstone: Production-Grade Pipeline🔒
Prerequisites
What to learn next
Read the first chapter free
Start reading now — no account required for the free chapters.