Parquet — Part 1: Layout, Types, and Encodings
Layout, physical types, encodings, codecs, logical types, and nested schemas — the bytes of a Parquet file from the inside out.
Part 1 of the Parquet deep-dive. You learn the physical layout (row groups, column chunks, pages, footer), the eight physical types, every encoding (PLAIN, dictionary, RLE/bit-packing, delta, delta-strings, byte stream split), compression codecs, logical-type annotations, and how nested schemas (lists, maps, structs) are mapped to columns via the Dremel model. Includes Docker labs you can run locally.
.parquet — or load the built-in sample — to see the schema, row groups, encodings, compression and statistics these lessons describe, 100% in your browser. Open the tool →Course content
- 01Why Parquet? The Case for Columnar StorageFree
- 02File Anatomy: Row Groups, Column Chunks, and Pages🔒
- 03The Footer: Reading a Parquet File Backwards🔒
- 04Physical Data Types: The 8 Primitives🔒
- 05Lab: Write and Inspect Your First Parquet File🔒
- 06PLAIN Encoding: The Uncompressed Baseline🔒
- 07Dictionary Encoding: Deduplicating Column Values🔒
- 08RLE / Bit-Packing Hybrid: Runs, Levels, and Repetition🔒
- 09Delta Encoding: Sequential Integer Compression🔒
- 10Delta-Length Byte Array and Delta Strings🔒
- 11Byte Stream Split: Better Compression for Floats🔒
- 12Lab: Benchmarking Encodings with PyArrow🔒
- 13Compression Codecs: Snappy, ZSTD, GZIP, and Brotli🔒
- 14Logical Types: Annotating Physical Types🔒
- 15The Dremel Model: Definition and Repetition Levels🔒
- 16Nested Schemas: Lists, Maps, and Structs in Parquet🔒
- 17Lab: Nested Parquet with PyArrow and DuckDB🔒
What to learn next
Read the first chapter free
Start reading now — no account required for the free chapters.