Module: Setup | Duration: ~8 min | Lesson: 0 of 8

1. What You'll Build

The same DuckDB + Docker lab from Course 1.1, with one addition: a change-log CSV (customer_changes.csv) that records the real changes customers underwent over a year — region moves, segment promotions, name corrections. You'll replay the change log against a starting dim_customer snapshot to demonstrate every SCD type.

If you completed Course 1.1's lab, you already have everything. Just pull the additional seed file.

2. Prerequisites

Course 1.1 lab is set up (DuckDB Docker image pulled, seed/ directory exists).
~50 MB additional disk for the change-log CSV.
The four base tables (raw_orders, raw_customers, raw_products, raw_dates) loadable.

If you skipped Course 1.1, go run that Lesson 0 first.

3. Installation

All OSes

cd ~/s7-lab
curl -L -o customer_changes.csv \
  https://github.com/data-learning-course/s7-seed/releases/download/v1/customer_changes.csv
mv customer_changes.csv seed/
ls seed/customer_changes.csv

The CSV has the shape:

customer_id,changed_at,column_name,old_value,new_value
42,2026-02-14,region,EU,APAC
42,2026-04-01,segment,silver,gold
107,2026-03-22,customer_email,old@x.com,new@x.com
...

If the URL is unreachable, run the included seed/gen_changes.py to synthesize equivalent data from raw_customers.

4. Verify Your Setup

From the lab shell:

docker run --rm -it -v $(pwd)/seed:/seed datacatering/duckdb:latest \
  -c "CREATE TABLE customer_changes AS SELECT * FROM read_csv('/seed/customer_changes.csv', header=true); SELECT COUNT(*) AS n FROM customer_changes;"

Expected output:

┌──────┐
│  n   │
├──────┤
│  342 │
└──────┘

If you see ~342 (the exact number may vary by seed version), the change log is loaded and you're ready.

5. Copy Prompt

I'm setting up the lab environment for the "Slowly Changing Dimensions" course on data-learning. I've already completed the lab setup for "Dimensional Modeling Fundamentals" so DuckDB-in-Docker and the base seed CSVs are working.

This course adds one file: customer_changes.csv in the seed/ directory. The file has columns: customer_id, changed_at, column_name, old_value, new_value, and contains ~342 row-level change events.

Verification: SELECT COUNT(*) FROM customer_changes should return ~342.

My machine:
- OS: <I will fill in>
- Existing tools: <I will fill in>

Walk me through downloading the CSV and verifying the load. If the download URL fails, help me synthesize equivalent data from raw_customers.csv using SQL — describe the SQL approach to generate ~340 random change events distributed across ~100 customers over a 12-month window.