Lesson 1: Why Regulations Exist: The Cost of Getting PII Wrong

Course: Regulations & Compliance | Duration: ~18 min | Lesson: 1 of 8


On September 5, 2018, British Airways discovered something had gone wrong. For two weeks, a 22-line JavaScript skimmer had been silently embedded on their checkout page, harvesting the full payment card details, billing addresses, and travel itineraries of every customer who booked a flight. By the time the attack was discovered, roughly 500,000 customers had had their data stolen and forwarded to a server in Romania. The skimmer was almost elegant in its simplicity: it intercepted form submissions and exfiltrated the data before it ever reached BA's legitimate payment processor.

The UK's Information Commissioner's Office (ICO) opened an investigation. Under the newly enacted GDPR, which had come into force just four months earlier, British Airways faced a fine of up to 4% of global annual turnover. In July 2019, the ICO announced its intention to fine BA £183.39 million, the largest GDPR penalty ever proposed at the time. The final penalty, issued in October 2020 after the company appealed and presented evidence of financial hardship from COVID-19, was reduced to £20 million. That is still the equivalent of roughly $26 million USD, for a JavaScript file that should never have been there.

The attack itself was not novel. Magecart-style skimmers had been plaguing e-commerce sites for years. What changed in 2018 was the consequence. Before GDPR, British Airways might have issued a press release, offered customers a year of credit monitoring, and moved on. The regulation changed the calculation entirely: now, failing to implement adequate technical controls was a financial catastrophe, not just a reputational inconvenience. And the people responsible for those technical controls, the engineers who build, deploy, and maintain data systems, were suddenly on the front lines of regulatory compliance.


2. Concept Explanation

The Pre-GDPR Landscape: Why Self-Regulation Failed

For most of the internet's commercial history, the model for data protection was self-regulation. Companies wrote privacy policies, made vague promises about data security, and faced limited legal exposure when things went wrong. The United States relied on a patchwork of sector-specific rules (HIPAA for health, COPPA for children, GLBA for financial data) with no comprehensive baseline. The European Union had the 1995 Data Protection Directive, but enforcement was fragmented across member states and rarely aggressive.

The results were predictable. Between 2005 and 2018, there were over 11,000 reported data breaches in the US alone. Equifax exposed 147 million people's Social Security numbers, dates of birth, and credit histories in 2017, and paid a settlement of $575 million to the FTC, which amounted to roughly $4 per affected person. Yahoo disclosed that all 3 billion of its user accounts had been compromised in a breach dating back to 2013. The market did not punish these companies enough to change behavior. Self-regulation had demonstrably failed.

The economic problem was an externality: companies captured the value of collecting personal data but did not bear the full cost when that data was compromised. The harm fell on individuals, in the form of fraud, identity theft, and loss of privacy, while companies faced minimal financial consequence. Regulation is the mechanism by which society re-internalizes that externality.

The Economic Case for Regulation

From a pure incentives perspective, regulation works by making the cost of non-compliance exceed the cost of compliance. A company that spends $500K on access controls, encryption, and data minimization is spending money it would rather not spend, until the alternative is a $20M fine plus remediation costs plus class action exposure. GDPR fines of up to 4% of global annual turnover create meaningful stakes even for large enterprises.

This is why the regulations we study in this course have teeth. GDPR, CCPA, and HIPAA are not aspirational guidelines. They are legal frameworks with enforcement bodies, investigation powers, and financial penalties designed to make data protection economically rational.

The Full Cost Structure of a Data Breach

The headline fine is rarely the most expensive part of a breach. Data engineers who argue for security investment need to understand the full cost picture:

Cost CategoryDescriptionTypical Range
Regulatory finesGDPR up to 4% global turnover; HIPAA up to $1.9M per violation category$50K – $200M+
Forensic investigationIncident response firms, log analysis, scope determination$500K – $5M
Breach notificationLegal counsel, notification letters, call center staffing$100K – $2M
Credit monitoringPer-affected-person monitoring services, typically 1-3 years$10 – $30/person/year
Legal defense & settlementsClass action litigation, regulatory defense counsel$1M – $500M+
Lost revenue / churnCustomer cancellations, reduced acquisition, brand damageOften the largest cost
RemediationPatching systems, rebuilding pipelines, implementing new controls$500K – $10M
Increased insurance premiumsCyber insurance re-pricing post-breach20-100% increase

IBM's annual Cost of a Data Breach report consistently puts the global average breach cost above $4 million. For healthcare breaches specifically, the average exceeds $10 million per incident. These numbers make a compelling business case for the engineering investment required to comply.

Why Data Engineers Bear Implementation Responsibility

Lawyers and compliance officers can interpret regulations and document policies. Privacy officers can draft consent frameworks. But the actual technical mechanisms that make compliance possible, or impossible, are built by data engineers.

When GDPR says a data subject has the right to erasure within 30 days, the question of whether that deadline is achievable depends on whether your pipeline was designed for it. Did you build a data lineage system that tracks where each user's records flow? Did you design your data lake with partitioning that makes targeted deletion possible? Did you instrument your Kafka topics with user IDs in a way that enables compaction-based erasure? These are engineering decisions made at pipeline design time, and they determine whether your company can comply.

Three principles should guide how you think about regulations as an engineer:

  • Privacy by design: Build PII controls into your architecture from the start. Retrofitting is 5-10x more expensive than designing correctly upfront.
  • Data minimization: Collect only what you need. Every field you don't collect is a field you can't lose in a breach and don't need to manage for compliance.
  • Data lineage: Know where every piece of PII flows. You cannot delete what you cannot find.

3. Worked Example

The British Airways Attack: A Technical Post-Mortem

The Magecart attack on British Airways exploited a third-party JavaScript dependency. Here is a simplified reconstruction of what the attacker did and what technical controls could have prevented it:

Attack surface:

User browser
  → loads ba.com checkout page
  → page loads legitimate payment script from payment-gateway.com
  → page ALSO loads attacker-controlled script (injected via compromised dependency)
  → attacker script intercepts form.submit() event
  → harvests: card number, CVV, expiry, name, billing address, email
  → POSTs data to: baways[.]com (attacker-controlled lookalike domain)

What was missing (technical controls that would have helped):

ControlWhat it doesWould it have helped?
Content Security Policy (CSP)Whitelist which domains can load scriptsYes, blocked exfiltration domain
Subresource Integrity (SRI)Hash-verify third-party scriptsYes, detected tampered script
Third-party script inventoryAudit all JS dependencies quarterlyYes, caught rogue script
PCI DSS segmentationIsolate payment page from marketing stackYes, reduced attack surface
Real-time anomaly detectionAlert on new outbound domainsYes, detected exfiltration

The breach was not the result of a sophisticated attack against BA's core systems. It was the result of inadequate controls around a basic web security concern that had been well-understood for a decade. Regulation created the financial consequence that made ignoring those controls untenable.


4. Your Turn

Exercise 1: A mid-sized e-commerce company has annual global revenue of €800 million. They suffer a GDPR breach affecting 50,000 EU customers. The ICO determines they had inadequate technical controls. Calculate the maximum possible fine. Then calculate what a 3% fine (a more realistic enforcement outcome) would be. At what level of compliance investment would the fine no longer make financial sense to risk?

Exercise 2: You're joining a new data engineering team. In your first week, you discover the following: (a) the production database contains 14 fields that appear to be PII but no one can tell you why they're being collected; (b) the data warehouse has no deletion capability built in; (c) customer data is shared with three analytics vendors, but there are no Data Processing Agreements in place. Rank these three issues by severity and explain your reasoning.


5. Real-World Application

British Airways is not alone in paying a steep price for technical failures that were entirely preventable. In 2019, Marriott International was fined £18.4 million by the ICO after a breach originating in a Starwood hotel database acquired during a 2016 merger exposed 339 million guest records, including passport numbers and payment card data. The root cause: Marriott's security team had not conducted adequate due diligence on the acquired system's security posture before merging it into their environment. GDPR requires organizations to maintain appropriate security not just for systems they build, but for data they acquire through corporate transactions.

Equifax's $575 million FTC settlement (with up to $700 million in total penalties) tells a different story about the cost of technical debt. The breach stemmed from an unpatched Apache Struts vulnerability that had a known CVE and an available patch for two months before the attack. Equifax had a formal patching process; it simply wasn't followed. The regulatory and legal consequence was not just financial, the company's CEO resigned, Congress held multiple hearings, and Equifax spent over $1.4 billion on security improvements in the two years following the breach. The cost of patching one vulnerability would have been measured in hours of engineer time.

The pattern across these cases is consistent: the companies that face catastrophic regulatory consequences are rarely victimized by attacks that couldn't have been anticipated. They are caught by attackers exploiting known weaknesses, missing patches, unmonitored third-party scripts, inadequate access controls, data retained longer than necessary, that competent data engineering practices would have addressed. Regulations exist because the market alone does not create sufficient incentive to invest in those practices. Your job, as a data engineer, is to build systems where the technical implementation of privacy controls is as rigorous as the implementation of performance or reliability.

Aha: The fine in the headline is the cheapest line item. Forensics, breach notification, churn, and class actions usually dwarf it. That's why "we'll just pay the fine" is never the rational plan a finance team imagines it is.


6. Recap + Bridge

Regulations like GDPR, CCPA, and HIPAA exist because self-regulation produced a predictable outcome: companies optimized for data collection and underinvested in data protection, externalizing the costs of breaches onto individuals. The full cost of a breach, fines, forensics, notification, litigation, and churn, makes compliance investment economically rational. Data engineers are the implementers of compliance, not passive recipients of legal requirements.

The key takeaway: privacy controls are not an afterthought or a legal checkbox. They are engineering requirements with financial consequences attached, and they must be designed into your systems from day one.

In the next lesson, we go deep on GDPR specifics: the six lawful bases for processing, the eight subject rights, and exactly what technical capabilities your systems must support to fulfill them.