Storage & File Formats
A query that used to scan 200 GB now scans 2 TB, and the warehouse bill follows. Nothing changed in the SQL — but the table is now millions of tiny files with column statistics too coarse to skip anything, so every query reads everything.
This stratum takes Parquet apart byte by byte: the file layout and footer, the encodings (dictionary, run-length, bit-packing) and compression trade-offs, row groups and page indexes, how predicate pushdown actually fires, and modular column encryption. You come out able to reason about exactly which bytes a query touches — and why.
What you'll learn
- Read a Parquet footer, row groups, and column chunks — and predict which pages a query skips
- Choose encodings and compression (Snappy vs Zstd) from the data instead of guessing
- Use column statistics and page indexes so predicate pushdown actually fires
- Reason about Parquet modular encryption and which engines support which features
Tracks & courses
Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.
Columnar Storage
File formats, encodings, and the statistics that make queries fast.
Parquet — Part 1: Layout, Types, and Encodings
Layout, physical types, encodings, codecs, logical types, and nested schemas — the bytes of a Parquet file from the inside out.
17 ch · 4h 18m
1 freeParquet — Part 2: Indexing, Encryption, and Engines
Indexing, predicate pushdown, encryption, the Variant type, and engine integrations.
13 ch · 3h 22m
1 freeRelated topics
Start Storage & File Formats free
The first chapters of every course are free to read — no account needed.