Petascale Labs
The PlatformSimulation ArcadeLibraryToolsPricing
Curriculum
  • Storage & File Formats
  • Ingestion & Transport
  • Open Table Formats
  • Compute Engines
    Apache Spark: Fundamentals
    Apache Spark: Advanced Internals
    • 01The DAG Scheduler InternalsFree
    • 02Shuffle Deep Dive🔒
    • 03Memory Management: The Unified Memory Model🔒
    • 04Tungsten: Off-Heap Binary Format and Cache-Aware Computation🔒
    • 05Whole-Stage Code Generation🔒
    • 06Catalyst Optimizer Deep Dive🔒
    • 07Adaptive Query Execution (AQE)🔒
    • 08Join Strategies: Internals🔒
    • 09Data Skew: Detection and Fixes🔒
    • 10Partitioning and Bucketing🔒
    • 11Kryo Serialization and GC Tuning🔒
    • 12Data Locality and Speculative Execution🔒
    • 13Advanced Configuration Mastery🔒
    • 14Spark on YARN🔒
    • 15Spark on Kubernetes🔒
    • 16GraphX🔒
    • 17MLlib Pipelines🔒
    • 18MLlib Algorithms Deep Dive🔒
    • 19Delta Lake and Lakehouse Patterns🔒
    • 20Capstone: Production-Grade Pipeline🔒
    Apache Spark: Streaming
  • Orchestration & Pipelines
  • PII & Data Governance
  • Query Engines & OLAP
  • Semantic & Metrics Layer
CoursesChallenges
  1. Home/
  2. Curriculum/
  3. Compute Engines/
  4. Apache Spark: Advanced Internals

Apache Spark: Advanced Internals

DAG scheduler, shuffle mechanics, Tungsten, Catalyst, AQE, data skew, and Delta Lake.

Go deep on the DAG scheduler, shuffle mechanics, Tungsten, Catalyst optimizer, AQE, data skew, Kubernetes, GraphX, MLlib, and Delta Lake patterns.

Advanced20 chapters· 6h 50m· in Compute Engines

Course content

  1. 01The DAG Scheduler InternalsFree
  2. 02Shuffle Deep Dive🔒
  3. 03Memory Management: The Unified Memory Model🔒
  4. 04Tungsten: Off-Heap Binary Format and Cache-Aware Computation🔒
  5. 05Whole-Stage Code Generation🔒
  6. 06Catalyst Optimizer Deep Dive🔒
  7. 07Adaptive Query Execution (AQE)🔒
  8. 08Join Strategies: Internals🔒
  9. 09Data Skew: Detection and Fixes🔒
  10. 10Partitioning and Bucketing🔒
  11. 11Kryo Serialization and GC Tuning🔒
  12. 12Data Locality and Speculative Execution🔒
  13. 13Advanced Configuration Mastery🔒
  14. 14Spark on YARN🔒
  15. 15Spark on Kubernetes🔒
  16. 16GraphX🔒
  17. 17MLlib Pipelines🔒
  18. 18MLlib Algorithms Deep Dive🔒
  19. 19Delta Lake and Lakehouse Patterns🔒
  20. 20Capstone: Production-Grade Pipeline🔒

Prerequisites

↗Apache Spark: Fundamentals

What to learn next

↗Apache Spark: Streaming· next

Read the first chapter free

Start reading now — no account required for the free chapters.

Start: The DAG Scheduler Internals →More in Compute Engines
Petascale Labs

The physics layer of data

From byte-level storage to business-grade metrics. Built with depth, not breadth.

Curriculum

All strataStorage & File FormatsIngestion & TransportOpen Table FormatsCompute EnginesOrchestration & PipelinesQuery Engines & OLAPSemantic & Metrics LayerPII & Data Governance

Tools

All toolsParquet ViewerFreeSCD PlaygroundFree

Company

AboutContact

Legal

Privacy PolicyTerms of ServiceCookie Policy

Email

hello@petascalelabs.com

Support

support@petascalelabs.com

Company

Petascale Labs, Inc.

© 2026 Petascale Labs, Inc. All rights reserved.

PrivacyTermsCookiesContact