Apache Spark: Fundamentals
RDDs, DataFrames, Spark SQL, joins, window functions, and production batch pipelines.
Learn distributed computing from scratch — RDDs, DataFrames, Spark SQL, joins, window functions, and deploying production batch pipelines with Apache Spark.
Course content
- 01Why Distributed Computing?Free
- 02Spark Architecture Deep Dive🔒
- 03Your First Spark App🔒
- 04RDDs: The Foundation🔒
- 05Transformations vs Actions🔒
- 06Key-Value RDDs: PairRDDs, Shuffles, and the groupByKey Trap🔒
- 07RDD Persistence & Caching🔒
- 08Broadcast Variables & Accumulators🔒
- 09Enter DataFrames🔒
- 10Spark SQL🔒
- 11Data Sources: Read & Write🔒
- 12DataFrame Transformations🔒
- 13Joins: The Hard Part🔒
- 14Window Functions🔒
- 15User-Defined Functions (UDFs)🔒
- 16Datasets: Type-Safe DataFrames🔒
- 17Partitioning Strategy🔒
- 18Deploying Spark Apps🔒
- 19Monitoring & Debugging🔒
- 20Capstone: ShopStream Batch Analytics Pipeline🔒
What to learn next
Read the first chapter free
Start reading now — no account required for the free chapters.