avatar𝗨𝘇𝗺𝗮𝗻 Ali

Summary

The website content provides an overview of five free data engineering projects that can help build a strong portfolio in 2023, emphasizing the importance of hands-on experience in the booming field of data engineering.

Abstract

The article "5 Free Data Engineering Projects to Build Your Strong Portfolio in 2023" outlines a curated list of projects aimed at enhancing the practical skills of data engineering professionals. It underscores the growing demand for data engineers in managing data infrastructure and science pipelines. The projects range from creating a Twitter data pipeline using Airflow to building a YouTube end-to-end data engineering project, covering a variety of tools and platforms such as Python, PySpark, AWS services, Apache Kafka, and more. The article emphasizes the educational value of these projects for both beginners and those looking to upskill, and it encourages readers to follow the author on Medium and LinkedIn for further insights into data science.

Opinions

  • The author believes that hands-on projects are crucial for becoming a proficient data engineer, as they provide practical experience in building and managing data pipelines.
  • Data Engineering is portrayed as a critical and increasingly sought-after skill set in the current job market, referred to as "the Future of AI."
  • The article suggests that completing these projects can significantly contribute to building a robust portfolio and resume, which is essential for job seekers in the data engineering field.
  • The author expresses enthusiasm and support for the Medium platform, encouraging readers to use their referral link to join Medium and follow their work, indicating a sense of community and mutual support among content creators and readers.
  • There is an appreciation for the accessibility of free resources and projects that can help democratize learning and skill development in the realm of data engineering.

5 Free Data Engineering projects to build your strong portfolio in 2023

🤖Transform Your Big Data Skills with These Proven Projects with Source Code

Data Engineering is a rewarding career that is Booming right now.

The need for skilled professionals in this field is increasing day by day. DE is critical in a data-driven world to build, design, and manage data science pipelines and maintain data infrastructure for powering data applications.

To summarize into a pharase: “Data Engineering is the Future of AI”💥🚀

But the problem is —

Finding Data Engineering Projects is very Hard! That’s why I made this list of 5 Projects you can practice NOW🔥🔥🔥

If you have an interest in Data Engineering or wanna upskills your expertise to a higher level. You have clicked on the right Blog. This Blog will also help you build a great portfolio/resume for your job.

Reminder —

I do not know how you do it, but hit the link Referal Linkto join medium and Follow me. Your support means the world! Thank You🙏

THUMBNAIL

Let’s get started then😄!

1- Twitter Data Pipeline using Airflow

When you hear the word data pipeline in the Big data community.

You only need to think one thing, a Data Engineer. Because it’s their job to build, design and manage pipelines. So, if you want to become a Pro, you need to master skills and tools related to the data pipeline.

They are things that are a must for you. This project will teach you the basics of Airflow and how to build a data pipeline.

You will be learning —

Python for DE, Basic Airflow, Working with Twitter Data Package — Tweepy, and Writing ETL jobs — storing data on Amazon S3.

Here you go — Link🚀🚀

2- YouTube End-To-End Data Engineering Project

This is a 3-hour long project where you will execute a complete Data Engineering project.

The speaker will guide you in each step and also share each detail. I encourage you to do this project, especially as a beginner. This project teaches, How to understand the business problem and think like a DE.

In this project, you will be learning —

Python and PySpark, SQL, Amazon Web Services (AWS) — Athena, Glue, Redshift, S3, IAM, Lambda, Quicksight

Here you go — 🚀🚀

3- Surfline Dashboard

In this project, You will collect data from Surfline API through the pipeline and export a CSV file to Amazon S3.

After that, you will download the most recent file in S3 to be ingested into the Postgres data warehouse. In the end, you obtain a beautiful dashboard showing the data.

You will be learning —

AWS S3, Airflow, Pandas, Postgres, and Ploty.

Here you go — 🚀🚀

Author- Dashboard

4- The FinnHub Streaming Data Pipeline

This project aims to provide real-time financial data to your users through a robust architecture.

You will be streaming data pipelines based on FinnHub.io API — a WebSocket used for real-time trading data. You will also handle the design and implementation of data architecture.

Which will handle large volumes of data in real-time.

You will be learning —

Apache Kafka, Spark, Cassandra, Kubernetes, Grafana, and more.

Here you go — 🚀🚀

Author — Dashboard

5- Audiophile End-To-End ELT Pipeline

In this project, you will build, design, and manage a data pipeline.

Which is going to extract data from Crinacle’s Headphone and InEarMonitor databases and finalize data for a Metabase Dashboard. You will also perform all DAG tasks.

Which includes scraping, loading, and transforming data into a Warehouse.

You will be learning —

AWS S3, Redshift, RDS, dbt (data transformation tool), Airflow

Here you go — 🚀🚀

Dashboard link

Big shoutout to all Authors 🚀

That’s a wrap. Enjoy ❤️

I am Uzman Ali, and I talk about data science for Tech/non-Tech folks.

If you LIKE my article kindly SHARE it with your peers, and make sure to CLAP (up to 50!), follow me on Medium, and connect with me on LinkedIn and X(aka Twitter) to stay updated with my new articles 🤩.

Join Medium using my Referal Link to support me. Thanks🙏

Get These Free Badges & Certificates Now! 🔥🔥🔥

Data Engineering
Projects
Data Science
Top 5
AWS
Recommended from ReadMedium