Apache Kafka: Operations, Performance & Reliability
Capacity planning, tuning, observability, security, multi-region, DR, and cost for production Kafka.
Run Kafka like an SRE — capacity planning, producer/broker/consumer tuning, observability and lag SLOs, schema registry ops, security, multi-region and disaster recovery, ZooKeeper-to-KRaft migration, and cost optimization — so a cluster survives contact with production.
Course content
- 01Capacity Planning: Sizing Brokers, Disks & NetworkFree
- 02Producer Tuning for Throughput vs Latency🔒
- 03Consumer Tuning & Lag Management🔒
- 04Broker Tuning: JVM, Page Cache, Disk Layout🔒
- 05Observability: JMX, Burrow, Cruise Control, Lag SLOs🔒
- 06Schema Registry Operations & Compatibility Strategy🔒
- 07Security: TLS, mTLS, SASL, OAuth, ACLs🔒
- 08Multi-Region: MirrorMaker 2 vs Cluster Linking vs Stretch Clusters🔒
- 09Disaster Recovery & RPO/RTO Tradeoffs🔒
- 10Upgrades, Rolling Restarts & ZooKeeper→KRaft Migration🔒
- 11Cost Optimization: Storage, Network, Tiered Storage Economics🔒
- 12Self-Service Platform Patterns (Topic Governance, Quotas)🔒
- 13Capstone: Design a Multi-Region Kafka Platform for TheWorldShop🔒
Prerequisites
What to learn next
Read the first chapter free
Start reading now — no account required for the free chapters.