◼Strata 1

Storage & File Formats

A query that used to scan 200 GB now scans 2 TB, and the warehouse bill follows. Nothing changed in the SQL — but the table is now millions of tiny files with column statistics too coarse to skip anything, so every query reads everything.

This stratum takes Parquet apart byte by byte: the file layout and footer, the encodings (dictionary, run-length, bit-packing) and compression trade-offs, row groups and page indexes, how predicate pushdown actually fires, and modular column encryption. You come out able to reason about exactly which bytes a query touches — and why.

What you'll learn

Read a Parquet footer, row groups, and column chunks — and predict which pages a query skips
Choose encodings and compression (Snappy vs Zstd) from the data instead of guessing
Use column statistics and page indexes so predicate pushdown actually fires
Reason about Parquet modular encryption and which engines support which features

Tracks & courses

Full navigation is in the sidebar. Here's what each track gives you and the courses inside it.

Columnar Storage

File formats, encodings, and the statistics that make queries fast.

Parquet, Part 1: Layout, Types, and Encodings

Layout, physical types, encodings, codecs, logical types, and nested schemas: the bytes of a Parquet file from the inside out.

17 ch · 4h 18m

1 free

Parquet, Part 2: Indexing, Encryption, and Engines

Indexing, predicate pushdown, encryption, the Variant type, and engine integrations.

13 ch · 3h 22m

1 free

Start Storage & File Formats free

The first chapters of every course are free to read — no account needed.

Start: Parquet, Part 1: Layout, Types, and Encodings →All strata

What you'll learn

Tracks & courses

Columnar Storage

Related topics

Start Storage & File Formats free