PII Masking Policy Generator
A free, online tool: paste a sample, auto-detect the PII, and generate ready-to-run dynamic data masking policies for Snowflake, Databricks, and BigQuery — while you learn what hashing, tokenization, redaction, and generalization each actually protect.
Bring your data
Start from a safe synthetic sample, or switch to paste/upload your own — nothing leaves your browser.
See what's sensitive — and watch it get masked
Each record shows the raw value (Before) and what a non-privileged role sees (After). 0 of 0 columns masked.
| Row | ||||||||
|---|---|---|---|---|---|---|---|---|
1 | ||||||||
2 | ||||||||
3 | ||||||||
4 | ||||||||
5 |
| Row | user_id | full_name | phone | country | plan | ip_address | signup_date | |
|---|---|---|---|---|---|---|---|---|
| 1 | usr_fb8673 | Maria Gonzalez | maria.gonzalez@outlook.com | +1-721-555-7513 | DE | team | 67.250.63.223 | 2024-03-17 |
| 2 | usr_48c1e8 | James Okoro | james.okoro@outlook.com | +1-696-555-9801 | DE | pro | 5.22.196.94 | 2023-12-14 |
| 3 | usr_bb5522 | Sofia Schmidt | sofia.schmidt@initech.com | +1-316-555-3069 | BR | enterprise | 147.235.29.135 | 2023-09-16 |
| 4 | usr_d9ff8d | Omar Schmidt | omar.schmidt@acme.com | +1-537-555-5795 | AU | pro | 144.189.141.80 | 2024-02-28 |
| 5 | usr_fda099 | Maria Gonzalez | maria.gonzalez@globex.co | +1-929-555-6868 | GB | enterprise | 114.145.41.101 | 2024-03-09 |
Detected columns
Pick a masking technique for each — click a row to see how it works.
Turn it on in your warehouse
Role-conditional dynamic data masking — privileged roles see raw values, everyone else is masked. Pick your platform and copy.
Try this first. Syntax changes — grab a prompt and ask your AI assistant for the latest Snowflake masking approach, then compare with the DDL below. Use the DDL below to build an overall understanding of how each system differs — switch platforms above to see the same plan implemented each way.
CREATE MASKING POLICY + ALTER TABLE … SET MASKING POLICY
1. Nothing to mask yet
Pick a technique for at least one column above.
-- Select at least one column to mask.Reveals raw values to PII_ADMIN and masks everyone else — the same role-conditional pattern taught in the capstone lab.
Reference: which technique should I use?
Hashing vs tokenization vs redaction vs generalization — at a glance.
Compare all techniques
| Technique | Reversible? | Re-identifiable? | GDPR | HIPAA / Safe Harbor |
|---|---|---|---|---|
| Hashing | Irreversible | Yes, by brute force on low-entropy inputs unless salted | Still personal data (pseudonymization, not anonymization) — re-linkable, so GDPR rights still apply. | Not a Safe Harbor method on its own; a keyed hash can support Expert Determination. |
| Partial redaction | Irreversible | Partially — fragments can combine with other fields to re-identify | Reduces exposure but the remaining fragment is still personal data; treat as a control, not anonymization. | Last-4 alone is acceptable for display, but is NOT Safe Harbor de-identification. |
| Tokenization | Reversible with vault | Only with access to the token vault | Pseudonymization — still personal data while the vault exists; deleting the mapping approaches anonymization. | Common for PCI cardholder data; the token store must be isolated and access-logged. |
| Generalization | Irreversible | Lower risk, but quasi-identifiers can still combine | A key technique for moving toward true anonymization when combined across all quasi-identifiers. | Required by Safe Harbor: ZIP to 3 digits (000 for low-population prefixes), dates to year, ages 90+ grouped. |
| Nullification / suppression | Irreversible | No (the value is gone) | Removes the field as personal data downstream — but only if it isn't recoverable from a joined source. | Suppression is the core Safe Harbor move for the 18 identifiers that can't be generalized. |
PII masking — FAQ
- Is this PII masking tool free, and is it online?
- Yes — it's a completely free online tool with no sign-up. The whole PII masking generator runs in your browser as client-side JavaScript: parsing, PII detection, the live masked preview, and all SQL generation happen on your device, so it works online instantly and keeps working offline once loaded. Nothing is uploaded.
- How do I create a dynamic data masking policy in Snowflake?
- Define a masking policy with CREATE MASKING POLICY, returning the raw value for privileged roles and a masked value (e.g. SHA2, last-4 redaction, NULL) otherwise, then attach it with ALTER TABLE … MODIFY COLUMN … SET MASKING POLICY. This tool generates that exact DDL from your sample — paste data, pick a technique per column, and copy the result.
- What's the difference between hashing, tokenization, and partial redaction?
- Hashing is a one-way transform you can still join on but not reverse (salt low-cardinality columns). Tokenization swaps the value for a random token whose mapping lives in a separate vault, so it is reversible only with vault access. Partial redaction keeps a low-risk fragment (e.g. card last-4). The tool explains each technique, its reversibility, and its GDPR/HIPAA standing as you apply it.
- Is my data uploaded anywhere?
- No. Parsing, PII detection, the live masked preview, and all SQL generation run entirely in your browser with JavaScript — nothing is sent to a server. Open DevTools → Network to confirm.
- Does masking de-identify my data under GDPR or HIPAA?
- Not by itself. Dynamic masking is an exposure control: it hides values at query time, but the raw data still exists and privileged roles still see it, so the data stays personal data under GDPR and PHI under HIPAA. True de-identification needs Safe Harbor (suppression + generalization of all 18 identifiers) or Expert Determination on top.
- Which platforms does this free online tool generate masking code for?
- Snowflake (CREATE MASKING POLICY + ALTER TABLE SET MASKING POLICY), Databricks Unity Catalog (column masks via CREATE FUNCTION + ALTER COLUMN SET MASK), BigQuery (policy-tag data policies plus a masked-view fallback), and portable Generic SQL UDFs with a PySpark snippet — all generated free, online, in your browser.
- How does the PII auto-detection work?
- It inspects both the column name and the values — a real email regex, a Luhn check for payment cards, SSN and IP patterns — and assigns a sensitivity tier (T1–T4) and a default technique. Value matches outrank name matches, and you can override every column.
Right-to-erasure is an engineering problem
The PII & Data Governance stratum teaches classification, masking, de-identification, and how GDPR, CCPA, and HIPAA map to the controls you actually build.