Ingestion & Transport
Consumer lag spikes, a rebalance storm stalls the whole group, and you are staring at duplicate records downstream wondering whether “exactly-once” meant what you thought it meant.
This stratum builds Kafka from the fundamentals up to the wire: partitions and consumer groups, the in-sync-replica replication protocol, leader election, log compaction, and the real semantics behind at-least-once versus exactly-once — so you can trace a data-quality failure back to the broker instead of guessing.
What you'll learn
- Explain how producers, partitions, and consumer groups distribute and order data
- Reason about the ISR protocol, acks, and what exactly-once guarantees (and doesn't)
- Diagnose consumer rebalancing storms and lag from first principles
- Read Kafka's on-the-wire protocol and log-compaction behavior
Tracks & courses
Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.
Track A — Kafka: From Fundamentals to Internals
Master Kafka from streaming basics through broker internals, replication, and the wire protocol.
Apache Kafka: Fundamentals
Topics, partitions, producers, consumers, delivery semantics, schemas, and Kafka Connect.
14 ch · 4h 40m
1 freeApache Kafka: Internals & Protocol Deep Dive
Broker internals, replication, ISR, controller, log layout, and the Kafka wire protocol.
14 ch · 4h 40m
1 freeApache Kafka: Operations, Performance & Reliability
Capacity planning, tuning, observability, security, multi-region, DR, and cost for production Kafka.
13 ch · 4h 30m
1 freeTrack B — Change Data Capture (CDC)
Capture every insert, update, and delete from operational databases and stream it reliably — from Debezium fundamentals to outbox patterns, exactly-once sinks, and GDPR erasure at scale.
CDC Fundamentals with Debezium
Log-based CDC with Debezium: WAL/binlog, snapshot + streaming phases, connectors, and Kafka Connect.
11 ch · 3h 50m
1 freeCDC at Scale: Patterns & Pitfalls
Outbox pattern, ordering, schema evolution, DLQ/replay, exactly-once sinks, and GDPR erasure for CDC.
11 ch · 3h 50m
1 freeTrack C — Stream Processing on Top of Kafka
Process streams in flight — Kafka Streams as a library, Flink as a true streaming engine, and a decision framework for choosing between Kafka Streams, Flink, and Spark Structured Streaming.
Kafka Streams in Production
KStream/KTable, state stores, windowing, joins, topology, rebalancing, and interactive queries.
10 ch · 3h 30m
1 freeFlink for Stream Processing
Event time, watermarks, state backends, checkpoints, savepoints, exactly-once sinks, Flink SQL and CDC.
12 ch · 4h 10m
1 freeStream Processing Architecture: Kafka Streams vs Flink vs Spark
A decision framework for choosing a stream processor on latency, state, guarantees, ops cost, and team fit.
9 ch · 3h 10m
1 freeTrack D — Batch Ingestion & Backfills
The often-skipped half of ingestion — connector-based batch extraction with incremental cursors and idempotent loads, plus backfill strategies that load history into a lakehouse without breaking live streams.
Batch Ingestion Patterns
Connector-based batch ingestion: full vs incremental, cursors, idempotency, the Singer spec, and quality gates.
10 ch · 3h 30m
1 freeBackfill Strategies for Lakehouses
Partition-aware, idempotent backfills into Iceberg/Delta alongside live streams, with reconciliation.
10 ch · 3h 30m
1 freeTrack E — Architecting the Ingestion Layer
The architect capstone — design a production ingestion platform end to end: event-driven vs batch decisioning, schema registry and data contracts, topic taxonomy, multi-tenant ingest, DLQ/replay/retention policy, cost modeling, and cross-region resilience.
Related topics
Start Ingestion & Transport free
The first chapters of every course are free to read — no account needed.