Data Engineering Roadmap: Build the pipelines that make data useful at scale

Data Engineering

Build the pipelines that make data useful at scale

160h total9 courses3 stages

Start this roadmap free

What you'll be able to do

Build batch and streaming data pipelines
Model and warehouse data for analytics
Orchestrate workflows with tools like Airflow
Operate reliable, monitored data systems

Before you start

Python fundamentals
Basic SQL
Comfort with the command line

Level 1 ·Programming & SQL Mastery

Python for Data Engineering

beginner18h

Python beyond basics: file I/O, subprocess, requests, and writing production scripts.

Automate the Boring Stuff with Pythondocfree
Kaggle: Python Coursecoursefree

Parse CSV & JSON from disk and APIs
Context managers & error handling
Write a file processing pipeline script

SQL: Advanced Querying & Data Modelling

beginner20h

CTEs, window functions, dimensional modelling, and query optimisation.

Mode SQL Tutorialcoursefree
SQLBolt: Interactive SQL lessonscoursefree
PostgreSQL Tutorialdocfree

Window functions: ROW_NUMBER, LAG, LEAD
CTEs & recursive queries
Star schema design for a sales dataset

dbt: Data Build Tool

intermediate14h

Transform data in your warehouse with version-controlled, tested SQL models.

dbt Fundamentals Course (free)coursefree
dbt Docsdocfree

Staging → intermediate → mart model layers
dbt tests: unique, not_null, relationships
Generate dbt docs site

Level 2 ·Data Pipeline Tools

Apache Airflow: Workflow Orchestration

intermediate18h

DAGs, operators, sensors, XComs, and managing complex pipeline dependencies.

Apache Airflow Official Tutorialdocfree
Astronomer: Airflow Guides (free)docfree

ETL DAG: extract from API → load to Postgres
Sensor that waits for a file to land
TaskGroup & dynamic task mapping

Kafka: Event Streaming

intermediate16h

Producers, consumers, topics, partitions, and stream processing with Kafka Streams.

Confluent Developer: Kafka Tutorials (free)coursefree
Kafka Docsdocfree

Producer & consumer in Python
Topic partitioning & consumer groups
Stream a real-time clickstream

Apache Spark: Distributed Processing

advanced18h

PySpark DataFrames, SQL, UDFs, and processing large datasets at scale.

Spark by Examples: PySpark Tutorialarticlefree
Frank Kane: Taming Big Data with Spark (Udemy)coursepaid

Load & transform 1M-row CSV with PySpark
Spark SQL join & aggregation
Write partitioned Parquet to S3

Level 3 ·Cloud Data Platforms & Capstone

BigQuery & Snowflake Data Warehousing

advanced16h

Cloud DWH architecture, cost management, partitioning, clustering, and BI integration.

Google BigQuery Quickstartdocfree
Snowflake Getting Started (free trial)docfree

Load data from GCS to BigQuery
Partition by date, cluster by user_id
Connect Looker Studio for visualisation

Data Quality & Great Expectations

advanced10h

Validate, document, and profile your data with Great Expectations.

Great Expectations Docsdocfree

Expectation suite for a pipeline output
Integrate GX into Airflow DAG

Capstone: End-to-End Data Platform

advanced30h

Ingest → transform → model → visualise: a complete modern data stack.

DataTalks.Club: Data Engineering Zoomcamp (free)coursefree

Kafka → Spark → BigQuery pipeline
dbt models on BigQuery
Airflow orchestrating the full flow
Dashboard in Looker Studio or Metabase

Frequently asked

Is the Data Engineering roadmap free?+

Yes. The entire Data Engineering roadmap and every curated resource is free to follow on Commit. You can track your progress, keep a daily streak, and earn a shareable certificate at no cost — there is no paywall.

How long does the Data Engineering roadmap take to complete?+

About 160 hours of focused study across 9 courses and 3 stages. At roughly one hour a day that is about 6 months; you can move faster by studying more each day.

Do I get a certificate for finishing the Data Engineering roadmap?+

Yes. When you complete the roadmap on Commit you receive a verifiable certificate of completion that you can add to LinkedIn and your public Commit profile as proof of what you finished.

Backend Engineering

Make it stick

Copy this roadmap into Commit and turn it into a tracked program with a streak graph, study logging, and a shareable certificate when you finish. Free forever.

Start Data Engineering free

Data Engineering

What you'll be able to do

Before you start

Level 1 ·Programming & SQL Mastery

Python for Data Engineering

SQL: Advanced Querying & Data Modelling

dbt: Data Build Tool

Level 2 ·Data Pipeline Tools

Apache Airflow: Workflow Orchestration

Kafka: Event Streaming

Apache Spark: Distributed Processing

Level 3 ·Cloud Data Platforms & Capstone

BigQuery & Snowflake Data Warehousing

Data Quality & Great Expectations

Capstone: End-to-End Data Platform

Frequently asked

Backend Engineering: Node.js

Frontend React Developer

Full-Stack Next.js Engineer

Make it stick

Data Engineering

What you'll be able to do

Before you start

Level 1 ·Programming & SQL Mastery

Python for Data Engineering

SQL: Advanced Querying & Data Modelling

dbt: Data Build Tool

Level 2 ·Data Pipeline Tools

Apache Airflow: Workflow Orchestration

Kafka: Event Streaming

Apache Spark: Distributed Processing

Level 3 ·Cloud Data Platforms & Capstone

BigQuery & Snowflake Data Warehousing

Data Quality & Great Expectations

Capstone: End-to-End Data Platform

Frequently asked

Related roadmaps

Backend Engineering: Node.js

Frontend React Developer

Full-Stack Next.js Engineer

Make it stick