Strata 1

Storage & File Formats

A query that used to scan 200 GB now scans 2 TB, and the warehouse bill follows. Nothing changed in the SQL — but the table is now millions of tiny files with column statistics too coarse to skip anything, so every query reads everything.

This stratum takes Parquet apart byte by byte: the file layout and footer, the encodings (dictionary, run-length, bit-packing) and compression trade-offs, row groups and page indexes, how predicate pushdown actually fires, and modular column encryption. You come out able to reason about exactly which bytes a query touches — and why.

What you'll learn

  • Read a Parquet footer, row groups, and column chunks — and predict which pages a query skips
  • Choose encodings and compression (Snappy vs Zstd) from the data instead of guessing
  • Use column statistics and page indexes so predicate pushdown actually fires
  • Reason about Parquet modular encryption and which engines support which features

Tracks & courses

Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.

Related topics

Start Storage & File Formats free

The first chapters of every course are free to read — no account needed.