Parquet, Part 1: Layout, Types, and Encodings

Layout, physical types, encodings, codecs, logical types, and nested schemas: the bytes of a Parquet file from the inside out.

Part 1 of the Parquet deep-dive. You learn the physical layout (row groups, column chunks, pages, footer), the eight physical types, every encoding (PLAIN, dictionary, RLE/bit-packing, delta, delta-strings, byte stream split), compression codecs, logical-type annotations, and how nested schemas (lists, maps, structs) map to columns via the Dremel model. Includes Docker labs you can run locally.

Foundations17 chapters· 4h 18m· in Storage & File Formats

Explore this course on a real file in the Parquet Viewer. Drop any .parquet — or load the built-in sample — to see the schema, row groups, encodings, compression and statistics these lessons describe, 100% in your browser. Open the tool →

Course content

What to learn next

↗Parquet, Part 2: Indexing, Encryption, and Engines· next

Read the first chapter free

Start reading now — no account required for the free chapters.

Start: Why Parquet? The Case for Columnar Storage →More in Storage & File Formats