Apache Kafka: Operations, Performance & Reliability

Capacity planning, tuning, observability, security, multi-region, DR, and cost for production Kafka.

Run Kafka like an SRE — capacity planning, producer/broker/consumer tuning, observability and lag SLOs, schema registry ops, security, multi-region and disaster recovery, ZooKeeper-to-KRaft migration, and cost optimization — so a cluster survives contact with production.

Advanced13 chapters· 4h 30m· in Ingestion & Transport

Course content

  1. 01Capacity Planning: Sizing Brokers, Disks & NetworkFree
  2. 02Producer Tuning for Throughput vs Latency🔒
  3. 03Consumer Tuning & Lag Management🔒
  4. 04Broker Tuning: JVM, Page Cache, Disk Layout🔒
  5. 05Observability: JMX, Burrow, Cruise Control, Lag SLOs🔒
  6. 06Schema Registry Operations & Compatibility Strategy🔒
  7. 07Security: TLS, mTLS, SASL, OAuth, ACLs🔒
  8. 08Multi-Region: MirrorMaker 2 vs Cluster Linking vs Stretch Clusters🔒
  9. 09Disaster Recovery & RPO/RTO Tradeoffs🔒
  10. 10Upgrades, Rolling Restarts & ZooKeeper→KRaft Migration🔒
  11. 11Cost Optimization: Storage, Network, Tiered Storage Economics🔒
  12. 12Self-Service Platform Patterns (Topic Governance, Quotas)🔒
  13. 13Capstone: Design a Multi-Region Kafka Platform for TheWorldShop🔒

Prerequisites

What to learn next

Read the first chapter free

Start reading now — no account required for the free chapters.