Parquet — Part 2: Indexing, Encryption, and Engines
Indexing, predicate pushdown, encryption, the Variant type, and engine integrations.
Part 2 of the Parquet deep-dive. Picks up where Part 1 ends: row-group and page statistics, the page index, bloom filters, end-to-end predicate pushdown, modular encryption (column- and footer-level), the Variant type and shredding, and how Parquet plugs into Spark, Iceberg, Delta Lake, DuckDB, and Arrow. Includes Docker labs.
.parquet — or load the built-in sample — to see the schema, row groups, encodings, compression and statistics these lessons describe, 100% in your browser. Open the tool →Course content
- 01Statistics: Min, Max, Null Count, and Distinct CountFree
- 02Page Index: ColumnIndex and OffsetIndex🔒
- 03Bloom Filters: Probabilistic Predicate Pushdown🔒
- 04Predicate Pushdown End-to-End: How Engines Skip Data🔒
- 05Lab: Measure Predicate Pushdown Gains with DuckDB🔒
- 06Encryption: AES-GCM, Column-Level and Footer Keys🔒
- 07Variant Type: Semi-Structured Data in Typed Columns🔒
- 08Variant Shredding: Extracting Fields into Real Columns🔒
- 09Lab: Encrypted Parquet with Python🔒
- 10Parquet in Apache Spark: Reader and Writer Internals🔒
- 11Parquet in Iceberg and Delta Lake🔒
- 12Parquet in DuckDB and Apache Arrow🔒
- 13Format Evolution, Versioning, and Production Best Practices🔒
Prerequisites
Read the first chapter free
Start reading now — no account required for the free chapters.