Day 8 of 30 days of Data Engineering Series with Projects

Welcome back peeps to Day 8 of Data Engineering Series with Projects!

Day 1 : What’s Data Engineering, Why Data Engineering, Data Engineers — ML Engineers — Data Scientists, Purpose and Scope

Day 2 : Complete Python for Data Engineering — Part 1

Day 3 : Complete Advanced Python for Data Engineering — Part 2

Day 4: Techniques to write efficient and Optimized Code

Day 5 : SQL

Day 6 : Advanced SQL

Day 7 : BigQuery and SQL vs NOSQL databases

Day 8 : Advanced Functions

Projects Videos —

Subscribe today!

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

www.youtube.com

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Ignito:

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication with hundreds of…

naina0405.substack.com

System Design Case Studies — In Depth

Design Instagram

Design Netflix

Design Reddit

Design Amazon

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Amazon Prime Video

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Mega Compilation : Solved System Design Case studies

Pre-requisite to Day 8 is to complete Day 1–7( link below):

Day 1 of 30 days of Data Engineering can be found below —

Day 1 of 30 days of Data Engineering

With examples and projects…

medium.com

Day 2 of 30 days of Data Engineering can be found below —

Day 2 of 30 days of Data Engineering

With examples and projects…

medium.com

Day 3 of 30 days of Data Engineering can be found below —

Day 3 of 30 days of Data Engineering Series

With examples and projects …

medium.com

This is Day 8 of 30 days of Data Engineering Series where we will be covering —

Advanced Functions

Let’s get started!

Some of the most important Advanced Functions —

Aggregate functions: These functions are used to summarize data, such as counting the number of rows, calculating the sum or average of a column, or finding the minimum or maximum value. Examples include COUNT(), SUM(), AVG(), MIN(), and MAX().
Window functions: These functions are used to perform calculations across rows within a specific window or frame of a query result set. Examples include ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE().
String functions: These functions are used to manipulate and extract information from strings. Examples include SUBSTRING(), LENGTH(), CONCAT(), UPPER(), and LOWER().
Date and time functions: These functions are used to work with date and time data. Examples include CURRENT_DATE(), CURRENT_TIME(), CURRENT_TIMESTAMP(), and EXTRACT().
Conversion functions: These functions are used to convert data from one type to another. Examples include CAST() and CONVERT().
Conditional functions: These functions are used to perform conditional calculations. Examples include CASE(), COALESCE() and NULLIF().
Analytic functions: These functions are used to perform complex calculations over a group of rows. Examples include RANK(), DENSE_RANK(), ROW_NUMBER(), and LEAD() and LAG().

Code Implementation —

-- Aggregate functions
SELECT COUNT(column_name) AS count_rows
FROM table_name;

SELECT SUM(column_name) AS sum_value
FROM table_name;

SELECT AVG(column_name) AS average_value
FROM table_name;

SELECT MIN(column_name) AS min_value
FROM table_name;

SELECT MAX(column_name) AS max_value
FROM table_name;

-- Window functions
SELECT column_name, ROW_NUMBER() OVER (ORDER BY column_name) AS row_num
FROM table_name;

SELECT column_name, RANK() OVER (PARTITION BY partition_column ORDER BY order_column) AS rank_num
FROM table_name;

SELECT column_name, DENSE_RANK() OVER (PARTITION BY partition_column ORDER BY order_column) AS dense_rank_num
FROM table_name;

SELECT column_name, NTILE(4) OVER (ORDER BY column_name) AS ntile_num
FROM table_name;

-- String functions
SELECT SUBSTRING(column_name, start_index, length) AS substring_value
FROM table_name;

SELECT LENGTH(column_name) AS length_value
FROM table_name;

SELECT CONCAT(string1, string2) AS concatenated_string
FROM table_name;

SELECT UPPER(column_name) AS uppercase_value
FROM table_name;

SELECT LOWER(column_name) AS lowercase_value
FROM table_name;

-- Date and time functions
SELECT CURRENT_DATE() AS current_date;

SELECT CURRENT_TIME() AS current_time;

SELECT CURRENT_TIMESTAMP() AS current_timestamp;

SELECT EXTRACT(YEAR FROM date_column) AS year_value
FROM table_name;

-- Conversion functions
SELECT CAST(column_name AS data_type) AS converted_value
FROM table_name;

SELECT CONVERT(column_name, data_type) AS converted_value
FROM table_name;

-- Conditional functions
SELECT column_name, CASE WHEN condition THEN value1 ELSE value2 END AS conditional_value
FROM table_name;

SELECT column_name, COALESCE(column_name1, column_name2) AS coalesced_value
FROM table_name;

SELECT column_name, NULLIF(value1, value2) AS null_if_value
FROM table_name;

-- Analytic functions
SELECT column_name, RANK() OVER (PARTITION BY partition_column ORDER BY order_column) AS rank_num
FROM table_name;

SELECT column_name, DENSE_RANK() OVER (PARTITION BY partition_column ORDER BY order_column) AS dense_rank_num
FROM table_name;

SELECT column_name, ROW_NUMBER() OVER (PARTITION BY partition_column ORDER BY order_column) AS row_num
FROM table_name;

SELECT column_name, LEAD(column_name, offset) OVER (ORDER BY order_column) AS lead_value
FROM table_name;

SELECT column_name, LAG(column_name, offset) OVER (ORDER BY order_column) AS lag_value
FROM table_name;

Views in SQL

Views are important constructs in SQL. A view is nothing but a temporary virtual table created from one or more tables, based on the result returned by the SQL statement.

Views are helpful in creating complex queries and implementing readability in the SQL statements. Views also help in maintaining data integrity as it shows an accurate and consistent views of data in the database. Most importantly, this helps in providing better security to the data.

Syntax —

CREATE VIEW view_name AS SELECT t1, t2, t3…tn ( column_name) FROM table WHERE condition

To update a view —

CREATE OR REPLACE VIEW view_name AS SELECT t1, t2, t3…tn ( column_name) FROM table WHERE condition

Example —

CREATE VIEW [Salary_sum] AS SELECT employee_name, employee_salary FROM Employee WHERE employee_salary > (SELECT SUM(employee_salary) FROM Employee)

To query from the view created —

SELECT * FROM Salary_sum

It will show all the results from the view Salary_sum.

To Delete a View —

DROP VIEW Salary_sum

Create a view from multiple Tables —

CREATE VIEW view_name AS

SELECT Employee.employee_name, Company. employee_address, Company.employee_salary

FROM Employee INNER JOIN Company

WHERE Employee.employee_name = Company.employee_name

Code Implementation —

Indexes in SQL

Indexes are very useful as they help speed up the query execution and helps faster retrieval of the data.

Syntax —

CREATE INDEX index_name

ON table (column_1, column_2, …column_n)

There are two types of indexes —

Implicit indexes — indexes that are created by databases internally to store, retrieve faster and efficiently.
Composite indexes — indexes that are created by using multiple columns to uniquely identify the data points.

Syntax —

CREATE UNIQUE INDEX index_name

ON tableName (column1, column2, …)

Example —

CREATE INDEX emp_indx

ON employee(emp_name, salary, city)

To drop the indexes —

ALTER TABLE table

DROP INDEX index_name

Auto Increment in SQL

Autoincrement helps to identify the records very easily once set on a particular row(s). These are assigned to the primary key to make sure the data has a unique identification.

Syntax —

CREATE TABLE tableName (

column_1 NOT NULL AUTO_INCREMENT,

t1, t2, t3 ( column_names)……

PRIMARY KEY(col1))

Example —

CREATE TABLE Employee(

employee_id int NOT NULL auto_increment,

employee_name varchar(60),

city varchar(10),

PRIMARY KEY(employee_id))

Triggers in SQL

Triggers are an important concept in SQL. These are nothing but event driven SQL queries which are stored in the memory and fired when required.

The benefits of using triggers are reduce computational times, reuse of the code, helps check integrity of the database and detect any error or issues with the code.

There are two types of Triggers —

Statement Level Triggers

Triggers of this kind run only once irrespective of how many rows are involved.

Row level Triggers

In this the triggers run multiple times and changes are done in the rows on which SQL queries are triggered.

Syntax —

CREATE TRIGGER trigger_name

BEFORE/AFTER

SQL Statement (INSERT/UPDATE/DELETE/ALTER)

ON table_name

FOR EACH ROW SET operation

Triggers are set before and after the SQL statements.

Example —

CREATE TRIGGER experience_years AFTER UPDATE ON Employee

FOR EACH ROW SET @exp_years = @exp_years + 3

UPDATE Employee

SET exp_years = 20

WHERE experience = 2

Pivot in SQL

Pivot is an important function in SQL which allows us to aggregate the results and rotate rows into columns which is also called as cross tabulation.

Syntax —

SELECT t1, t2,

pivot_value1, pivot_value2, pivot_value3 … pivot_value_n

FROM

src_table

PIVOT

(aggregate_function(column_names)

FOR pivot_column

IN (pivot_value1, pivot_value2, pivot_value3 … pivot_value_n)

Example —

SELECT * FROM (

SELECT salary, year(date) year, month(date) month,

FROM employee_salary

WHERE date BETWEEN DATE ‘2021–01–01’ AND DATE ‘2019–08–31’)

PIVOT (

CAST(avg(salary) AS DECIMAL(2,1))

FOR month in ( months_name))

ORDER BY year ASC

The aggregate function are used to perform calculation on multiple rows/values and return a single result as the answer. It ignores NULL values when performing aggregation.

Some of the most important aggregate functions are —

COUNT() — To count the number of rows
SUM() — To return sum of the column
AVG () — To return average of the column
MIN() — To return min value in the column
MAX() — To return max value in the column

Format —

Count(column_name)

Sum(column_name)

Avg(column_name)

Min(column_name)

Max(column_name)

Example —

SELECT COUNT(employee_id)

FROM employee_table

WHERE salary > 40000

SELECT MAX(salary)

FROM employee_table

SELECT MIN(salary)

FROM employee_table

SELECT AVG(salary)

FROM employee_table

SELECT SUM(salary)

FROM employee_table

Cursors in SQL

Cursors allow to store the db tables and points as a context pointer to the memory area which stores the SQL statements which are required to be executed.

There are two types of cursors —

Implicit cursors — used internally by the SQL when running any Data Manipulation Query statement.
Explicit cursors — declared and used by the user externally.

Syntax —

DECLARE cursor_name CURSOR FOR sql_statement

Example —

CREATE PROCEDURE employee_db()

BEGIN

Declare employee_Id varchar(10)

Declare cursor_name CURSOR FOR SELECT emp_id FROM Employee

SELECT * FROM Employee

OPEN cursor_name

FETCH cursor_name INTO employee_Id;

CLOSE cursor_name

END

That’s it for now. Day 9 Coming soon!

Let me know if you have questions in the comment section below. Subscribe/ Follow, Like/Clap as it would encourage me to write more in my free time

Stay Tuned!!

All the Complete System Design Series Parts —

1. System design basics

2. Horizontal and vertical scaling

3. Load balancing and Message queues

4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture

5. Caching, Indexing, Proxies

6. Networking, How Browsers work, Content Network Delivery ( CDN)

7. Database Sharding, CAP Theorem, Database schema Design

8. Concurrency, API, Components + OOP + Abstraction

9. Estimation and Planning, Performance

10. Map Reduce, Patterns and Microservices

11. SQL vs NoSQL and Cloud

12. Most Popular System Design Questions

Github —

Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design

This repository contains everything you need to become proficient in System Design Topics you should know in System…

github.com

Keep learning and coding ;)

Day 5 coming soon!

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

medium.com

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

medium.datadriveninvestor.com

For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

medium.com

Follow for more updates. Stay tuned and keep coding! Disclosure: Some of the links are affiliates.

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

medium.datadriveninvestor.com

Recurrent Neural Network with Keras

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

medium.datadriveninvestor.com

Clustering Geolocation Data in Python using DBSCAN and K-Means

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

medium.datadriveninvestor.com

Facial Expression Recognition using Keras

Facial Expression Recognition using Keras

Project Implementation…

medium.datadriveninvestor.com

Hyperparameter Tuning with Keras Tuner

Hyperparameter Tuning with Keras Tuner

Project Implementation….

medium.datadriveninvestor.com

Custom Layers in Keras

Custom Layers in Keras

Code implementation …

medium.datadriveninvestor.com

Day 8 of 30 days of Data Engineering Series with Projects

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

Tech Newsletter —

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication with hundreds of…

System Design Case Studies — In Depth

Design Instagram

Design Netflix

Design Reddit

Design Amazon

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Amazon Prime Video

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Most Popular System Design Questions

Mega Compilation : Solved System Design Case studies

Day 1 of 30 days of Data Engineering

With examples and projects…

Day 2 of 30 days of Data Engineering

With examples and projects…

Day 3 of 30 days of Data Engineering Series

With examples and projects …

Advanced Functions

Views in SQL

To Delete a View —

Indexes in SQL

Auto Increment in SQL

Triggers in SQL

Pivot in SQL

Cursors in SQL

That’s it for now. Day 9 Coming soon!

Read more —

All the Complete System Design Series Parts —

Github —

Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design

This repository contains everything you need to become proficient in System Design Topics you should know in System…

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

For other projects, tune to —

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

Facial Expression Recognition using Keras

Project Implementation…

Hyperparameter Tuning with Keras Tuner

Project Implementation….

Custom Layers in Keras

Code implementation …