Data engineering, by concept
272 concepts from across the stack - storage, ingestion, modeling, query engines, governance. Each one links to the courses that teach it, a free tool to practice it, and the related ideas to learn next.
Featured
All topics
A
B
C
capacity planningcardinality estimationcatalyst optimizerCCPACDCchange data capturechange data feedcheckpointingclickhousecolumn statisticscolumn taggingcolumn-level lineagecolumnar storagecompactioncompliancecompression codecsconcurrency controlconformed dimensionsconnectorsconsumer groupsconsumer lagcopy-on-writecost vs latencycost-based optimizationcumulative metrics
D
dag deploymentDAGsdagsterdata anomaliesdata classificationdata contractsdata federationdata freshnessdata governancedata incidentsdata inventorydata lineagedata maskingdata migrationdata modelingdata observabilitydata privacy regulationsdata purgedata qualitydata retentiondata scanningdata skewdata skippingdata vaultdataframesdbt semantic layerde-identificationdead letter queuedebeziumdeletion vectorsdelivery semanticsdelta lakedenormalizationdeprecationdictionary encodingdifferential privacydimension tablesdimensional modelingdisaster recoverydistributed executiondremel encodingdrift detectiondrill-across queriesDStreamsdual-write problemduckdbdynamic data masking
F
I
K
M
P
page indexparquetparquet encryptionpartitioningpartitionsperformance tuningPHIPIIPII detectionPII governancePII lifecyclePII pipelinepipeline designpipeline failurespipeline migrationspipeline statepoint-in-time tablespostgres replicationpostmortemspre-aggregationpredicate pushdownprefectproducers and consumerspyspark
Q
R
S
SCDschedule driftschedulingschema changesschema driftschema evolutionschema registrysecrets managementsegmentssemantic graphsemantic layersensorsshardingshuffleshufflessingersingle source of truthskip indexesSLAsSLO SLIslowly changing dimensionssnapshot expirationsnapshotssnowflake schemaspark on kubernetesspark sqlspark structured streamingsparse primary indexstar schemastar-tree indexstateful streamingstream joinsstream processingstream processing tradeoffssurrogate keyssymmetric aggregatessynthetic data
T
Start with the fundamentals
The first chapter of every course is free to read - no account needed.