PySpark PII Detection

Detect PII at scale with PySpark regex and Microsoft Presidio: patterns, UDF performance, confidence scoring, and a real scan job.

Build production-grade PII detection in PySpark using regex patterns and Presidio NLP. Scan terabytes of raw data, score confidence, and produce actionable per-column PII reports.

Advanced7 chapters· 2h 50m· in PII & Data Governance

Course content

  1. 01Lesson 1: The PII Detection Problem at ScaleFree
  2. 02Lesson 2: Regex Patterns for PII🔒
  3. 03Lesson 3: Microsoft Presidio🔒
  4. 04Lesson 4: Spark UDFs for PII Scanning🔒
  5. 05Lesson 5: Confidence Scoring🔒
  6. 06Lesson 6: Building a PII Scan Job🔒
  7. 07Lesson 7: Scan a Dataset for PII Using Presidio + PySpark🔒

Prerequisites

What to learn next

Read the first chapter free

Start reading now — no account required for the free chapters.