Lab Setup

Module: Setup | Duration: ~6 min | Lesson: 0 of 8


1. What You'll Build

The same DuckDB-in-Docker sandbox from earlier Strata 7 courses, plus a starter OBT (obt_orders_v1) — a wide, denormalized table joining the Track 7.1 customers, orders, products, and dim_date data into a single 18-column row.

You'll use this to benchmark queries against the equivalent star schema, and to feel firsthand where OBT wins and where it costs.


2. Prerequisites

  • Docker Desktop ≥ 4.x.
  • ~100 MB disk.
  • Track 7.1 lab CSVs (raw_customers.csv, raw_orders.csv, raw_products.csv if you have it; otherwise the included synthesizer).

3. Installation

mkdir -p ~/s7-obt-lab/seed ~/s7-obt-lab/sql
cd ~/s7-obt-lab

curl -L -o seed/raw_customers.csv  https://github.com/data-learning-course/s7-seed/releases/download/v1/raw_customers.csv
curl -L -o seed/raw_orders.csv     https://github.com/data-learning-course/s7-seed/releases/download/v1/raw_orders.csv
curl -L -o seed/raw_products.csv   https://github.com/data-learning-course/s7-seed/releases/download/v1/raw_products.csv

Fallback if downloads fail: the Copy Prompt below describes how to synthesize the three CSVs from a small Python script.


4. Verify Your Setup

docker run --rm -it -v $(pwd)/seed:/seed datacatering/duckdb:latest -c \
  "CREATE TABLE raw_customers AS SELECT * FROM read_csv('/seed/raw_customers.csv', header=true);
   CREATE TABLE raw_orders    AS SELECT * FROM read_csv('/seed/raw_orders.csv', header=true);
   CREATE TABLE raw_products  AS SELECT * FROM read_csv('/seed/raw_products.csv', header=true);
   SELECT 'customers' AS t, COUNT(*) FROM raw_customers
   UNION ALL
   SELECT 'orders',         COUNT(*) FROM raw_orders
   UNION ALL
   SELECT 'products',       COUNT(*) FROM raw_products;"

Expected output: ~500 customers, ~4200 orders, ~120 products.


5. Copy Prompt

I'm setting up the lab for "One Big Table & Wide-Table Design" on data-learning. I need DuckDB-in-Docker with three seed CSVs in seed/: raw_customers.csv (~500 rows, id/name/email/region), raw_orders.csv (~4200 rows, with order_id, customer_id, product_id, order_date, amount, status), raw_products.csv (~120 rows, with product_id, name, category, price).

Verification: load the three tables in DuckDB-in-Docker and confirm the row counts roughly match (500 / 4200 / 120).

My machine:
- OS: <I will fill in>
- RAM: <I will fill in>
- Existing tools: <I will fill in>

Walk me through downloading the CSVs and running the verification. If the downloads fail, help me synthesize the three CSVs with a short Python script: generate 500 customers with diverse regions, 120 products across 6 categories, and 4200 orders that join valid customer_id and product_id with order_dates spanning the last 18 months and amounts varying by product category. Do not include secrets.