Data engineering, from byte-level storage to the semantic layer.
Most courses teach tools. This one teaches the physics of data — the layers of the stack, top to bottom, and the failure modes that show up in real systems. Built for senior and aspiring data engineers who want to reason from first principles, not memorize commands.
7 strata · 1 specialization · 33 courses. Each stratum holds tracks; each track, a sequence of courses; each course, chapters you can read free to start.
Try the free in-browser tools
No account, nothing uploaded — they run entirely in your browser.
Themes
Storage & File Formats
Model Parquet column statistics and Zstd compression ratios to cut GCS scan costs by 60–80%.
Ingestion & Transport
Understand Kafka's ISR protocol, exactly-once semantics, and CDC log tailing — so you can trace data quality failures back to their source.
Open Table Formats
How Iceberg snapshot isolation prevents silent data loss — and the exact conditions when it doesn't.
Compute Engines
Diagnose Flink backpressure to its root network buffer and tune Spark shuffle partitions with precision.
Orchestration & Pipelines
Name the failure modes before you learn the tools, debug Airflow scheduler internals, set freshness SLAs that page for the right reasons, and design pipelines that survive partial failures without reprocessing everything.
Query Engines & OLAP
Reason about ClickHouse MergeTree merges, Trino's cost-based optimizer, and Druid segment distribution — so you can own query latency end-to-end.
Semantic & Metrics Layer
Build dbt metrics that survive schema migrations without breaking upstream dashboards — and enforce data contracts before bad data reaches production.
PII & Data Governance
Mask PII at ingestion, enforce access at table formats, and design right-to-erasure into the storage layer.
Start with a free chapter
Every course opens with chapters you can read without an account. Go as deep as you like before you decide.