PySpark PII Detection
Detect PII at scale with PySpark regex and Microsoft Presidio: patterns, UDF performance, confidence scoring, and a real scan job.
Build production-grade PII detection in PySpark using regex patterns and Presidio NLP. Scan terabytes of raw data, score confidence, and produce actionable per-column PII reports.
Course content
Prerequisites
What to learn next
Read the first chapter free
Start reading now — no account required for the free chapters.