Day 29 of 30 days of Data Engineering Series with Projects

Welcome back peeps to Day 29 of Data Engineering Series with Projects!

In this we will cover —

Data Engineering on cloud

AWS

AWS Services

Google Cloud Platform

GCP services

Pre-requisite to Day 29 is to complete Day 1–28( link below):

Day 1 : What’s Data Engineering, Why Data Engineering, Data Engineers — ML Engineers — Data Scientists, Purpose and Scope

Day 2 : Complete Python for Data Engineering — Part 1

Day 3 : Complete Advanced Python for Data Engineering — Part 2

Day 4: Techniques to write efficient and Optimized Code

Day 5 : SQL

Day 6 : Advanced SQL

Day 7 : BigQuery and SQL vs NOSQL databases

Day 8 : Advanced Functions

Day 9 : Query Optimizations

Day 10 : MySQL and PostgreSQL

Day 11: Shell scripting and Linux “touch” command

Day 12 : Map Reduce, Data Warehouse, Data Lakes

Day 13: Pandas, Pandas, Data Cleaning and processing, Outlier Detection, Noisy Data, Missing Data, Pandas Functions, Aggregate Functions, Joins

Day 14 : Numpy

Day 15 : Advanced Pandas Techniques

Day 16 : Data Pre-processing, Handling missing values, Data Cleaning, Mean/mode/median Imputation, Hot Deck Imputation, Rescale Data, Binarize Data, Regression Imputation, Stochastic regression imputation, Feature Scaling

Day 17 : Data Augmentation, Read and Process Large Datasets

Day 18 : Data Visualization basics, Data Visualization Projects, Data Visualization using Plotly and Bokeh, Data Profiling, Summary Functions, Indexing, Grouping, Linear Regression, Multi Linear Regression, Polynomial Regression, Regression, Support Vector Regression, Decision Tree Regression, Random Forest Regression, Feature Engineering, GroupBy Features, Categorical and Numerical Features, Missing Value Analysis, Fill the missing Values, Unique Value Analysis, Univariate Analysis, Bivariate Analysis, Multivariate Analysis, Correlation Analysis, Spearman’s ρ, Pearson’s r, Kendall’s τ, Cramér’s V (φc), Phik (φk)

Day 19 : MySQL and PostgreSQL

Day 20 : ETL ( Extract, Tranform and Load) basics, Why ETL is important?, How ETL works, ETL Tools

Day 21 : Structured Data, Semi Structured Data, Unstructured Data, Data Warehouse, Data Mart, Data Lake

Day 22 :Big Data, Types of Big Data, Big data tools, SQL and NoSQL Databases, Hadoop, Hadoop HDFS, Hadoop Yarn

Day 23: Batch Processing, Stream Processing, Apache Spark, Apache Spark Commands, Apache Kafka, How Apache Kafka works

Day 24 : Hive, Zookeper, Pig, Cassandra, Sqoop

Day 25: Docker, Docker vs Virtual Machines, Most important Docker commands, Kubernetes, Snowflake

Day 26 : Data Pipelines, Transformation, Processing, Workflow, Monitoring, Airflow, DAG

Day 27 : Power BI, Which chart to use and When?, Power BI — Data Analysis Expressions, Joins, Data Profiling

Day 28 : REST API, Postman, Data API

Day 29 : Data Engineering on cloud, AWS, AWS Services, Google Cloud Platform, GCP services

Projects Videos —

Subscribe today!

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

www.youtube.com

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Ignito:

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication with hundreds of…

naina0405.substack.com

System Design Case Studies — In Depth

Design Instagram

Design Netflix

Design Reddit

Design Amazon

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Amazon Prime Video

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Mega Compilation : Solved System Design Case studies

Let’s get started!

Data Engineering on cloud refers to the process of building, deploying, and maintaining systems for collecting, storing, processing, and analyzing data on cloud platforms such as AWS (Amazon Web Services) and GCP (Google Cloud Platform).

AWS is a cloud computing platform that offers a wide range of services for data engineering, including:

S3: Simple Storage Service, an object storage service that can be used to store and retrieve large amounts of data.
Glue: A fully managed extract, transform, and load (ETL) service that makes it easy to move and transform data.
Redshift: A data warehouse service that can be used to store and query large amounts of data.
DynamoDB: A NoSQL database service that can be used to store and retrieve data in a flexible and scalable manner.
Kinesis: A real-time streaming data service that can be used to collect, process, and analyze streaming data.

Google Cloud Platform (GCP) also offers a wide range of services for data engineering, including:

BigQuery: A fully managed, cloud-native data warehouse that can be used to store and query large amounts of data.
Cloud Storage: An object storage service similar to AWS S3.
Cloud Dataflow: A fully managed service for creating data pipelines and performing ETL operations.
Cloud Dataproc: A fully managed service for running Apache Hadoop and Apache Spark workloads.
Cloud SQL: A fully managed SQL database service that can be used to store and query relational data.

Data Engineering on cloud

It’s the process of building, testing, deploying data processing systems and ML modules on Cloud. Companies/organizations have large amount of data that needs to be stored, processed and analyzed accurately.

Cloud services for each step —

Data Ingestion — ETL and Stream Processing

Data Storage — RDBMS, Object Store, Key Value Store

Data Processing — ETL Services, Distributed processing, Data Pipeline, Workflow

Data Analysis and Visualization — Data Warehouse, Machine Learning, Streaming Analytics and Business Intelligence

Security — Authentication, Data Encryption, Access and Policies

Deployment — Containerization, Orchestration, CI/CD

AWS

AWS is a cloud computing platform providing cost effective, scalable solutions to the businesses all over the world.

The advantages of using AWS —

Automated Management

Speed and Agility

Security

Automated backups

Third Party APIs integration

Multi tenant architecture

Highly scalable according to the business needs

Affordable subscription model

Fixed and Predictable costs

For data engineering, big data management and analytics, AWS offers-

Amazon EMR — Hadoop framework to process large amount of data

Amazon Kinesis — To process and analyze data and develop interactive analytics

Amazon Glue — To extract, transform and load jobs

Amazon Athena — To query the data

Amazon QuickSight — To visualize the data

Amazon Elastic Search — To perform monitoring, log analysis etc.

For Machine Learning and AI, AWS offers —

Amazon lex — For chatbot tech

Amazon Forecast — To build end-to-end prediction model

Amazon Polly — To do text — to — speech transition

Amazon Textract — To extract imp text and data from documents

We will be covering each service in detail in the further posts.

Google Cloud Platform

Google Cloud platform is a cloud computing platform which offers easy to use, tools and services, global, regional and zonal resources.

It has —

Cloud storage — To store binary or object data like images, media etc

Cloud SQL — Database service which allows users to configure, use databases in the cloud.

Cloud Bigtable — High performance NoSQL big data service to cater to high workloads etc.

Cloud Datastore — NoSQL database that stores data in different formats

Cloud spanner — Used to manage globally distributed relational database

Big Query — Used for data analysis

Cloud Dataflow — To perform data batch and stream processing tasks

Cloud Datalab — To explore, analyze and visualize data

Cloud Dataproc — To process big dataset using Apache big data environment

For Machine Learning and AI —

Cloud AutoML — To provide high quality models

Cloud speech API and vision API — For speech recognition, image recognition and classification

Cloud NLP API — For NLP tasks, sentiment analysis and text classification etc

A project video covering Data engineering on cloud, AWS, Google Cloud Platform coming soon ( subscribe today) —

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

www.youtube.com

That’s it for now. Day 30: Coming soon!

Let me know if you have questions in the comment section below. Subscribe/ Follow, Like/Clap as it would encourage me to write more in my free time

Stay Tuned!!

All the Complete System Design Series Parts —

1. System design basics

2. Horizontal and vertical scaling

3. Load balancing and Message queues

4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture

5. Caching, Indexing, Proxies

6. Networking, How Browsers work, Content Network Delivery ( CDN)

7. Database Sharding, CAP Theorem, Database schema Design

8. Concurrency, API, Components + OOP + Abstraction

9. Estimation and Planning, Performance

10. Map Reduce, Patterns and Microservices

11. SQL vs NoSQL and Cloud

12. Most Popular System Design Questions

Github —

Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design

This repository contains everything you need to become proficient in System Design Topics you should know in System…

github.com

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

medium.com

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

medium.datadriveninvestor.com

For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

medium.com

Follow for more updates. Stay tuned and keep coding!

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

medium.datadriveninvestor.com

Recurrent Neural Network with Keras

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

medium.datadriveninvestor.com

Clustering Geolocation Data in Python using DBSCAN and K-Means

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

medium.datadriveninvestor.com

Facial Expression Recognition using Keras

Facial Expression Recognition using Keras

Project Implementation…

medium.datadriveninvestor.com

Hyperparameter Tuning with Keras Tuner

Hyperparameter Tuning with Keras Tuner

Project Implementation….

medium.datadriveninvestor.com

Custom Layers in Keras

Custom Layers in Keras

Code implementation …

medium.datadriveninvestor.com

Day 29 of 30 days of Data Engineering Series with Projects

Data Engineering on cloud

AWS

AWS Services

Google Cloud Platform

GCP services

Pre-requisite to Day 29 is to complete Day 1–28( link below):

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

Tech Newsletter —

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication with hundreds of…

System Design Case Studies — In Depth

Design Instagram

Design Netflix

Design Reddit

Design Amazon

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Amazon Prime Video

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Most Popular System Design Questions

Mega Compilation : Solved System Design Case studies

Let’s get started!

Data Engineering on cloud

AWS

Google Cloud Platform

For Machine Learning and AI —

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

That’s it for now. Day 30: Coming soon!

Read more —

All the Complete System Design Series Parts —

Github —

Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design

This repository contains everything you need to become proficient in System Design Topics you should know in System…

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

For other projects, tune to —

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

Facial Expression Recognition using Keras

Project Implementation…

Hyperparameter Tuning with Keras Tuner

Project Implementation….

Custom Layers in Keras

Code implementation …