Parquet — Part 1: Layout, Types, and Encodings

Layout, physical types, encodings, codecs, logical types, and nested schemas — the bytes of a Parquet file from the inside out.

Part 1 of the Parquet deep-dive. You learn the physical layout (row groups, column chunks, pages, footer), the eight physical types, every encoding (PLAIN, dictionary, RLE/bit-packing, delta, delta-strings, byte stream split), compression codecs, logical-type annotations, and how nested schemas (lists, maps, structs) are mapped to columns via the Dremel model. Includes Docker labs you can run locally.

Foundations17 chapters· 4h 18m· in Storage & File Formats
Explore this course on a real file in the Parquet Viewer. Drop any .parquet — or load the built-in sample — to see the schema, row groups, encodings, compression and statistics these lessons describe, 100% in your browser. Open the tool →

Course content

  1. 01Why Parquet? The Case for Columnar StorageFree
  2. 02File Anatomy: Row Groups, Column Chunks, and Pages🔒
  3. 03The Footer: Reading a Parquet File Backwards🔒
  4. 04Physical Data Types: The 8 Primitives🔒
  5. 05Lab: Write and Inspect Your First Parquet File🔒
  6. 06PLAIN Encoding: The Uncompressed Baseline🔒
  7. 07Dictionary Encoding: Deduplicating Column Values🔒
  8. 08RLE / Bit-Packing Hybrid: Runs, Levels, and Repetition🔒
  9. 09Delta Encoding: Sequential Integer Compression🔒
  10. 10Delta-Length Byte Array and Delta Strings🔒
  11. 11Byte Stream Split: Better Compression for Floats🔒
  12. 12Lab: Benchmarking Encodings with PyArrow🔒
  13. 13Compression Codecs: Snappy, ZSTD, GZIP, and Brotli🔒
  14. 14Logical Types: Annotating Physical Types🔒
  15. 15The Dremel Model: Definition and Repetition Levels🔒
  16. 16Nested Schemas: Lists, Maps, and Structs in Parquet🔒
  17. 17Lab: Nested Parquet with PyArrow and DuckDB🔒

What to learn next

Read the first chapter free

Start reading now — no account required for the free chapters.