Petascale Labs
The PlatformSimulation Arcade
RoadmapCoursesChallengesTopicsToolsFree
PricingBlog
  1. Home/
  2. Topics/
  3. PII detection
Topic

PII detection

PII detection shows up across 2 courses in 1 layerof the data platform stack. Here's where it's taught, a free way to practice it, and what to learn next.

🛡️Generate masking rules for PII fields. Paste a schema and get masking / anonymization strategies per column — tokenization, hashing, redaction — all generated in the browser. Open the PII Masking tool →

Where it's taught

🔐PII & Data Governance

PySpark PII Detection

Detect PII at scale with PySpark regex and Microsoft Presidio: patterns, UDF performance, confidence scoring, and a real scan job.

7 ch · 2h 50m

1 free

Capstone: End-to-End PII Pipeline

Ship the full pipeline: raw -> detect -> mask -> govern -> store in Iceberg, then handle a complete GDPR erasure cycle end to end.

8 ch · 3h 35m

1 free

Related topics

↗apache spark↗apache iceberg↗data governance↗data masking↗data scanning↗microsoft presidio↗PII pipeline↗pyspark↗regex

Start learning PII detection free

The first chapter of every course is free to read — no account needed.

Start: PySpark PII Detection →All strata
Petascale Labs

The physics layer of data

From byte-level storage to business-grade metrics. Built with depth, not breadth.

Curriculum

Data Engineer RoadmapAll strataStorage & File FormatsIngestion & TransportOpen Table FormatsCompute EnginesOrchestration & PipelinesQuery Engines & OLAPSemantic & Metrics LayerPII & Data Governance

Tools

All toolsParquet ViewerFreeSCD PlaygroundFreePII Masking GeneratorFree

Company

AboutBlogContact

Legal

Privacy PolicyTerms of ServiceCookie Policy

Email

hello@petascalelabs.com

Support

support@petascalelabs.com

Company

Petascale Labs, Inc.

© 2026 Petascale Labs, Inc. All rights reserved.

PrivacyTermsCookiesContact