Petascale Labs
The PlatformSimulation ArcadeLibraryToolsPricing
Curriculum
  • Storage & File Formats
  • Ingestion & Transport
  • Open Table Formats
  • Compute Engines
    Apache Spark: Fundamentals
    Apache Spark: Advanced Internals
    Apache Spark: Streaming
    • 01Why Stream Processing?Free
    • 02DStream Basics🔒
    • 03DStream Transformations🔒
    • 04Window Operations on DStreams🔒
    • 05Stateful DStreams, UpdateStateByKey🔒
    • 06DStream Output & Sinks🔒
    • 07Fault Tolerance in DStreams🔒
    • 08Kafka + DStream Integration🔒
    • 09Structured Streaming, The New Model🔒
    • 10Sources & Sinks, Structured Streaming🔒
    • 11Output Modes🔒
    • 12Triggers🔒
    • 13Event Time & Watermarks🔒
    • 14Stateful Aggregations🔒
    • 15Stream-Stream Joins🔒
    • 16Stream-Static Joins🔒
    • 17Structured Streaming Internals🔒
    • 18Kafka Deep Integration🔒
    • 19Monitoring & Backpressure🔒
    • 20Capstone: Real-Time Pipeline🔒
  • Orchestration & Pipelines
  • PII & Data Governance
  • Query Engines & OLAP
  • Semantic & Metrics Layer
CoursesChallenges
  1. Home/
  2. Curriculum/
  3. Compute Engines/
  4. Apache Spark: Streaming

Apache Spark: Streaming

DStreams, Structured Streaming, event time, watermarks, Kafka deep integration.

Master both DStreams and Structured Streaming — event time, watermarks, stateful aggregations, Kafka deep integration, and real-time pipeline architecture.

Intermediate20 chapters· 6h 40m· in Compute Engines

Course content

  1. 01Why Stream Processing?Free
  2. 02DStream Basics🔒
  3. 03DStream Transformations🔒
  4. 04Window Operations on DStreams🔒
  5. 05Stateful DStreams, UpdateStateByKey🔒
  6. 06DStream Output & Sinks🔒
  7. 07Fault Tolerance in DStreams🔒
  8. 08Kafka + DStream Integration🔒
  9. 09Structured Streaming, The New Model🔒
  10. 10Sources & Sinks, Structured Streaming🔒
  11. 11Output Modes🔒
  12. 12Triggers🔒
  13. 13Event Time & Watermarks🔒
  14. 14Stateful Aggregations🔒
  15. 15Stream-Stream Joins🔒
  16. 16Stream-Static Joins🔒
  17. 17Structured Streaming Internals🔒
  18. 18Kafka Deep Integration🔒
  19. 19Monitoring & Backpressure🔒
  20. 20Capstone: Real-Time Pipeline🔒

Prerequisites

↗Apache Spark: Fundamentals

Read the first chapter free

Start reading now — no account required for the free chapters.

Start: Why Stream Processing? →More in Compute Engines
Petascale Labs

The physics layer of data

From byte-level storage to business-grade metrics. Built with depth, not breadth.

Curriculum

All strataStorage & File FormatsIngestion & TransportOpen Table FormatsCompute EnginesOrchestration & PipelinesQuery Engines & OLAPSemantic & Metrics LayerPII & Data Governance

Tools

All toolsParquet ViewerFreeSCD PlaygroundFree

Company

AboutContact

Legal

Privacy PolicyTerms of ServiceCookie Policy

Email

hello@petascalelabs.com

Support

support@petascalelabs.com

Company

Petascale Labs, Inc.

© 2026 Petascale Labs, Inc. All rights reserved.

PrivacyTermsCookiesContact