Day 9 of 30 days of Data Engineering Series with Projects

Query Optimizations…

Welcome back peeps to Day 9 of Data Engineering Series with Projects!

Day 1 : What’s Data Engineering, Why Data Engineering, Data Engineers — ML Engineers — Data Scientists, Purpose and Scope

Day 2 : Complete Python for Data Engineering — Part 1

Day 3 : Complete Advanced Python for Data Engineering — Part 2

Day 4: Techniques to write efficient and Optimized Code

Day 5 : SQL

Day 6 : Advanced SQL

Day 7 : BigQuery and SQL vs NOSQL databases

Day 8 : Advanced Functions

Day 9 : Query Optimizations

Projects Videos —

Subscribe today!

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

www.youtube.com

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Ignito:

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication with hundreds of…

naina0405.substack.com

System Design Case Studies — In Depth

Design Instagram

Design Netflix

Design Reddit

Design Amazon

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Amazon Prime Video

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Mega Compilation : Solved System Design Case studies

Pre-requisite to Day 9 is to complete Day 1–8( link below):

Day 1 of 30 days of Data Engineering can be found below —

Day 1 of 30 days of Data Engineering

With examples and projects…

medium.com

Day 2 of 30 days of Data Engineering can be found below —

Day 2 of 30 days of Data Engineering

With examples and projects…

medium.com

Day 3 of 30 days of Data Engineering can be found below —

Day 3 of 30 days of Data Engineering Series

With examples and projects …

medium.com

This is Day 9 of 30 days of Data Engineering Series where we will be covering —

Query Optimizations in SQL

Let’s get started!

Some of the most Query Optimizations techniques are —

Use indexes: Indexes can greatly improve the performance of your queries by allowing the database to quickly locate the data it needs.
Limit the number of rows returned: Use the “LIMIT” clause to limit the number of rows returned by a query. This can improve performance and reduce memory usage.
Use the right join type: Use the appropriate join type (e.g. inner join, left join, right join) based on the specific needs of your query.
Use subqueries wisely: Subqueries can be useful, but they can also slow down a query if used excessively. Use subqueries sparingly and make sure they are optimized.
Avoid using “SELECT *”: Instead of using “SELECT *”, specify the specific columns you need in your query. This can improve performance and reduce memory usage.
Avoid using functions on indexed columns: Avoid using functions on indexed columns in the WHERE clause.
Use Temporary table: Use temporary tables to store intermediate results and then join them with other tables.
Use Explain plan: Use the “EXPLAIN PLAN” statement to analyze the execution plan of a query and identify any potential performance bottlenecks.
Use Partitioning: Use partitioning to divide large tables into smaller, more manageable pieces.
Use caching: Use caching to store the results of frequently-run queries in memory, so they can be retrieved quickly without the need to re-run the query.

-- Use indexes
CREATE INDEX index_name ON table_name (column1, column2, ...);

-- Limit the number of rows returned
SELECT column1, column2, ...
FROM table_name
LIMIT 100;

-- Use the right join type
SELECT column1, column2, ...
FROM table1
INNER JOIN table2 ON table1.column = table2.column;

-- Use subqueries wisely
SELECT column1, column2, ...
FROM table1
WHERE column1 IN (SELECT column2 FROM table2 WHERE condition);

-- Avoid using "SELECT *"
SELECT column1, column2, ...
FROM table_name;

-- Avoid using functions on indexed columns
SELECT column1, column2, ...
FROM table_name
WHERE indexed_column = some_value;  -- Avoid using functions on indexed_column

-- Use Temporary table
CREATE TEMPORARY TABLE temp_table AS
SELECT column1, column2, ...
FROM table1
JOIN table2 ON table1.column = table2.column;

-- Use Explain plan
EXPLAIN PLAN FOR
SELECT column1, column2, ...
FROM table_name
WHERE condition;

-- Use Partitioning
CREATE TABLE table_name (
  column1 datatype,
  column2 datatype,
  ...
)
PARTITION BY RANGE (column1) (
  PARTITION partition_name1 VALUES LESS THAN (value1),
  PARTITION partition_name2 VALUES LESS THAN (value2),
  ...
);

-- Use caching (specific to the database system)
-- Example in MySQL: Use query cache
SET GLOBAL query_cache_size = 1000000;

Let’s deep dive in!

A SQL statement can be written in various ways. An optimized SQL query depends on how the data is stored.

An optimized SQL query helps develop a better execution plan, used to analyze the runtime duration of the queries, Indexes which help fasten the query execution and organize the tables in most optimized way to make read and write operations faster and efficient.

Query optimization is the process of implementing the most efficient and optimal techniques that can be used to improve query performance and the runtime ( as well as other metrics).

This also includes optimal utilization of the resources, use of indexes to make the execution faster and better analysis of the performance metrics.

In order to optimize queries, there are many techniques that one can use —

1. Don’t use logical operators like multiple OR

SQL server evaluates OR after analyzing each component/condition that OR is used with and thus results into low performance.

Instead spilt the query into various components/parts and use UNION operator.

Example —

SELECT * FROM employee

WHERE employee_name = ‘Siya’

UNION

SELECT * FROM employee

WHERE emp_city = ‘Newyork’

2. Use Indexes

Indexes are very useful as they help speed up the query execution and helps faster retrieval of the data.

Syntax —

CREATE INDEX index_name

ON table (column_1, column_2, …column_n)

There are two types of indexes —

Implicit indexes — indexes that are created by databases internally to store, retrieve faster and efficiently.
Composite indexes — indexes that are created by using multiple columns to uniquely identify the data points.

Syntax —

CREATE UNIQUE INDEX index_name

ON tableName (column1, column2, …)

Example —

CREATE INDEX emp_indx

ON employee(emp_name, salary, city)

To drop the indexes —

ALTER TABLE table

DROP INDEX index_name

3. Dont use to many JOINS

JOINS make two or more tables join together which leads to additional overload which in turn affects the efficiency of the SQL query.

Instead of using many joins in the same SQL statement, divide the SQL query into separate subqueries.

4. Avoid using DISTINCT

DISTINCT helps to remove the duplicates and select unique values of the columns. In order to remove duplicates, the processing time needed to process large amount of data is slow which makes the SQL query very inefficient.

5. Avoid Using * with SELECT

Instead of using * which retrieves all the data from the rows, use specific columns especially when dealing with large amount of data. Along with it use indexes which will help fasten the data retrieval process.

6. Use Constraints

In order to avoid ambiguity in the data, we use Constraints which helps us better manage our data. These are nothing but rules which help us maintain integrity of the data and are applied at two levels —

Table level — applied to the whole table
Column level — applied to the data which is stored in the columns

Format —

CREATE TABLE table_name

(

column1 datatype constraint1,

column2 datatype constraint2,

….

)

Example —

CREATE TABLE employee

(

emp_Id int(10) NOT NULL UNIQUE,

employee_name varchar(10),

email varchar(20)

)

In the above example we are using, NOT NULL and UNIQUE constraint.

7. Use Views

Views are helpful in creating complex queries and implementing readability in the SQL statements. Views also help in maintaining data integrity as it shows an accurate and consistent views of data in the database. Most importantly, this helps in providing better security to the data.

Syntax —

CREATE VIEW view_name AS SELECT t1, t2, t3…tn ( column_name) FROM table WHERE condition

To update a view —

CREATE OR REPLACE VIEW view_name AS SELECT t1, t2, t3…tn ( column_name) FROM table WHERE condition

Example —

CREATE VIEW [Salary_sum] AS SELECT employee_name, employee_salary FROM Employee WHERE employee_salary > (SELECT SUM(employee_salary) FROM Employee)

8. Use TOP and Limit

TOP sets the limit on the no of results returned upon running the query. To make sure you get what you want to retrieve, use Top and Limit to see the snapshot/sample of the results.

SELECT TOP 10

employee_name, employee_salary

FROM Employee

-- Enable query optimization techniques
SET OPTIMIZE_FOR_KNOWN_CONSTRAINTS = ON;
SET QUERY_OPTIMIZATION_COMPATIBILITY_LEVEL = 150;

-- Create necessary indexes
CREATE INDEX idx_column1 ON table_name (column1);
CREATE INDEX idx_column2 ON table_name (column2);

-- Limit the result set
SELECT column1, column2, ...
FROM table_name
WHERE condition
LIMIT 100;

-- Minimize data retrieval
SELECT column1, column2, ...
FROM table_name;

-- Avoid unnecessary subqueries
SELECT column1, column2, ...
FROM table_name
WHERE column1 IN (SELECT column1 FROM another_table);

-- Use appropriate join types
SELECT column1, column2, ...
FROM table1
INNER JOIN table2 ON table1.column1 = table2.column1;

-- Optimize query logic
SELECT column1, column2, ...
FROM table_name
WHERE column1 = value AND column2 = value;

-- Analyze and update statistics
EXECUTE sp_updatestats;

-- Use query hints or directives
SELECT column1, column2, ...
FROM table_name
WITH (INDEX(idx_column1));

-- Use temporary tables or table variables
DECLARE @tempTable TABLE (column1 datatype, column2 datatype, ...);
INSERT INTO @tempTable (column1, column2, ...)
SELECT column1, column2, ...
FROM table_name;

-- Monitor query performance
EXECUTE sp_whoisactive;

Code Implementation —

Performance tuning in SQL

It’s the process which is used to improve the performance of the SQL queries i.e reduce the amount of time needed to run the query.

Three things that severly affect the query run time —

Aggregations — which requires a lot of computational overhead
Joins — Increases query run time
Size of the table — If the table size is large ( say ~ 1 million rows) then performance of the SQL query worsens.

So in order to optimize the performance —

Reduce the table size — Filter the data and save it in a temp table and use it to run the queries.
Use Limit — It helps limit the result samples.
Don’t use multiple joins

4. SET STATISTICS TIME ON to track which query is performing poorly during both compilation and execution time.

Code Implementation —

That’s it for now. Day 10 Coming soon!

Let me know if you have questions in the comment section below. Subscribe/ Follow, Like/Clap as it would encourage me to write more in my free time

Stay Tuned!!

All the Complete System Design Series Parts —

1. System design basics

2. Horizontal and vertical scaling

3. Load balancing and Message queues

4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture

5. Caching, Indexing, Proxies

6. Networking, How Browsers work, Content Network Delivery ( CDN)

7. Database Sharding, CAP Theorem, Database schema Design

8. Concurrency, API, Components + OOP + Abstraction

9. Estimation and Planning, Performance

10. Map Reduce, Patterns and Microservices

11. SQL vs NoSQL and Cloud

12. Most Popular System Design Questions

Github —

Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design

This repository contains everything you need to become proficient in System Design Topics you should know in System…

github.com

Keep learning and coding ;)

Day 5 coming soon!

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

medium.com

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

medium.datadriveninvestor.com

For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

medium.com

Follow for more updates. Stay tuned and keep coding! Disclosure: Some of the links are affiliates.

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

medium.datadriveninvestor.com

Recurrent Neural Network with Keras

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

medium.datadriveninvestor.com

Clustering Geolocation Data in Python using DBSCAN and K-Means

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

medium.datadriveninvestor.com

Facial Expression Recognition using Keras

Facial Expression Recognition using Keras

Project Implementation…

medium.datadriveninvestor.com

Hyperparameter Tuning with Keras Tuner

Hyperparameter Tuning with Keras Tuner

Project Implementation….

medium.datadriveninvestor.com

Custom Layers in Keras

Custom Layers in Keras

Code implementation …

medium.datadriveninvestor.com

Day 9 of 30 days of Data Engineering Series with Projects

Query Optimizations…

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

Tech Newsletter —

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication with hundreds of…

System Design Case Studies — In Depth

Design Instagram

Design Netflix

Design Reddit

Design Amazon

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Amazon Prime Video

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Most Popular System Design Questions

Mega Compilation : Solved System Design Case studies

Day 1 of 30 days of Data Engineering

With examples and projects…

Day 2 of 30 days of Data Engineering

With examples and projects…

Day 3 of 30 days of Data Engineering Series

With examples and projects …

Query Optimizations in SQL

Let’s deep dive in!

1. Don’t use logical operators like multiple OR

2. Use Indexes

3. Dont use to many JOINS

4. Avoid using DISTINCT

5. Avoid Using * with SELECT

6. Use Constraints

7. Use Views

8. Use TOP and Limit

Performance tuning in SQL

That’s it for now. Day 10 Coming soon!

Read more —

All the Complete System Design Series Parts —

Github —

Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design

This repository contains everything you need to become proficient in System Design Topics you should know in System…

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

For other projects, tune to —

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

Facial Expression Recognition using Keras

Project Implementation…

Hyperparameter Tuning with Keras Tuner

Project Implementation….

Custom Layers in Keras

Code implementation …