5 Free Data Engineering projects to build your strong portfolio in 2023
🤖Transform Your Big Data Skills with These Proven Projects with Source Code
Data Engineering is a rewarding career that is Booming right now.
The need for skilled professionals in this field is increasing day by day. DE is critical in a data-driven world to build, design, and manage data science pipelines and maintain data infrastructure for powering data applications.
To summarize into a pharase: “Data Engineering is the Future of AI”💥🚀
But the problem is —
Finding Data Engineering Projects is very Hard! That’s why I made this list of 5 Projects you can practice NOW🔥🔥🔥
If you have an interest in Data Engineering or wanna upskills your expertise to a higher level. You have clicked on the right Blog. This Blog will also help you build a great portfolio/resume for your job.
Reminder —
I do not know how you do it, but hit the link ‘Referal Link’ to join medium and Follow me. Your support means the world! Thank You🙏

Let’s get started then😄!
1- Twitter Data Pipeline using Airflow
When you hear the word data pipeline in the Big data community.
You only need to think one thing, a Data Engineer. Because it’s their job to build, design and manage pipelines. So, if you want to become a Pro, you need to master skills and tools related to the data pipeline.
They are things that are a must for you. This project will teach you the basics of Airflow and how to build a data pipeline.
You will be learning —
Python for DE, Basic Airflow, Working with Twitter Data Package — Tweepy, and Writing ETL jobs — storing data on Amazon S3.
2- YouTube End-To-End Data Engineering Project
This is a 3-hour long project where you will execute a complete Data Engineering project.
The speaker will guide you in each step and also share each detail. I encourage you to do this project, especially as a beginner. This project teaches, How to understand the business problem and think like a DE.
In this project, you will be learning —
Python and PySpark, SQL, Amazon Web Services (AWS) — Athena, Glue, Redshift, S3, IAM, Lambda, Quicksight

3- Surfline Dashboard
In this project, You will collect data from Surfline API through the pipeline and export a CSV file to Amazon S3.
After that, you will download the most recent file in S3 to be ingested into the Postgres data warehouse. In the end, you obtain a beautiful dashboard showing the data.
You will be learning —
AWS S3, Airflow, Pandas, Postgres, and Ploty.

4- The FinnHub Streaming Data Pipeline
This project aims to provide real-time financial data to your users through a robust architecture.
You will be streaming data pipelines based on FinnHub.io API — a WebSocket used for real-time trading data. You will also handle the design and implementation of data architecture.
Which will handle large volumes of data in real-time.
You will be learning —
Apache Kafka, Spark, Cassandra, Kubernetes, Grafana, and more.

5- Audiophile End-To-End ELT Pipeline
In this project, you will build, design, and manage a data pipeline.
Which is going to extract data from Crinacle’s Headphone and InEarMonitor databases and finalize data for a Metabase Dashboard. You will also perform all DAG tasks.
Which includes scraping, loading, and transforming data into a Warehouse.
You will be learning —
AWS S3, Redshift, RDS, dbt (data transformation tool), Airflow

Big shoutout to all Authors 🚀
That’s a wrap. Enjoy ❤️
I am Uzman Ali, and I talk about data science for Tech/non-Tech folks.
If you LIKE my article kindly SHARE it with your peers, and make sure to CLAP (up to 50!), follow me on Medium, and connect with me on LinkedIn and X(aka Twitter) to stay updated with my new articles 🤩.
Join Medium using my Referal Link to support me. Thanks🙏






