avatarSarah Floris

Summary

Common table expressions (CTEs) are a powerful SQL tool for creating temporary result sets that simplify complex queries and enhance data manipulation efficiency.

Abstract

Common table expressions (CTEs) serve as a temporary storage for intermediate results in SQL queries, enabling developers to write more readable and maintainable code. They are particularly useful for complex queries involving joins, unions, and subqueries. CTEs allow for the creation of a named temporary result set that can be referenced multiple times within a single query, thus reducing redundancy and improving performance. The article provides a step-by-step guide on how to write CTE queries, emphasizing their ability to filter large datasets before joining and their support for recursive operations. By using CTEs, SQL developers can streamline their code, making it more efficient and easier to understand, which can be a significant differentiator in their professional skill set.

Opinions

  • CTEs are praised for their ability to make complex SQL queries more readable and efficient.
  • The author suggests that CTEs can significantly improve the manageability of queries, especially when dealing with large datasets.
  • Recursive CTEs are highlighted as a unique feature that allows for hierarchical or tree-structured data processing within SQL.
  • The article implies that mastery of CTEs can elevate a SQL developer's capabilities and potentially their career prospects.
  • The use of CTEs for pre-filtering data is presented as a best practice to optimize query performance.
  • The author encourages sharing knowledge about CTEs, indicating a community-driven approach to learning and improving SQL practices.

A Simple Guide to Common Table Expressions

Creating cleaner and simpler SQL code

Photo by Caspar Camille Rubin on Unsplash

A common table expression (CTE) is a temporary result set that stores intermediate results. They are an incredibly powerful tool that SQL developers can use to solve complex problems and simplify complicated queries.

These queries make it possible for developers to work with data more efficiently and effectively. CTEs are also helpful when performing multiple operations on the same set of data.

In this article, you will learn what these common table expressions are, how to create them, and the advantages of using CTEs.

What are common table expressions?

Common table expressions (CTEs) store data in a single table using the WITH keyword. The data is stored as columns and rows, and the column names can be different from the table name.

A CTE is a simple way of creating a temporary table that can be joined within an SQL query. CTEs are often used for more complex queries, need to be efficient with repetition, or need the ability to use JOINs, UNIONs, aggregating functions, or subqueries.

How to write excellent CTE queries

The first step is to figure out how to write these beautiful queries.

I will be using the sample DVD rental database found here; you will find both the schema and the sample data.

  1. The first section is to start the first CTE with "With"
with customers_cte 
as 
(
select * from [Customers]
where City = "Lyon"
) 

2. Add more CTEs using an "AS" clause.

with customers_cte 
as 
(
select * from [Customers]
where City = 'Lyon'
),
orders_cte as 
(
SELECT * from [Orders] where YEAR(OrderDate) = 1996
) 

In this example, I only used two CTEs, but I could have added more if I wanted to.

3. The CTE must be followed by a single select, insert, update, or delete statement. Here's the final result.

with customers_cte 
as 
(
select * from [Customers]
where City = 'Lyon'
),
orders_cte as 
(
SELECT * from [Orders] where YEAR(OrderDate) = 1996
) 
select * from customers_cte
join orders_cte on customers_cte.CustomerID = orders_cte.CustomerID;

Now that you know how to write a CTE let's look at the advantages of these CTEs.

Advantages of using CTEs

  1. Simplicity and readability

We have all seen excellent SQL queries that are difficult to follow because they have unions, joins, and everything in between. Adding CTEs makes these SQL queries a lot more readable. Let's look at an example highlighting how simple a CTE can make a query.

SELECT * FROM [Orders] o join [Customers] c on c.CustomerID = o.CustomerID where c.City = "Lyon";

to

with customers_cte as
(
select * from [Customers]
where c.City = "Lyon"
)
select * from [Orders] o 
join customers_cte c
on c.CustomerID = o.CustomerID

Separating the joins allows me to quickly see the customers' table filtering. The longer these joins become, the easier it will be when you separate them.

2. Filtering large datasets before a join

Using common table expressions, you can filter the tables before joining. This change might not be as significant when dealing with smaller datasets. However, when the datasets get much larger, eliminating the number of joined rows can limit the number of resources needed.

Let's take a look at the example we have been using.

with customers_cte as
(
select * from [Customers]
where c.City = "Lyon"
)
select * from [Orders] o 
join customers_cte c
on c.CustomerID = o.CustomerID

In the customers_cte, filtering by "Lyon" limits the number of rows to just 2. If this dataset had 100,000 rows and filtered down to 2 rows, running this query has become a lot more manageable for whatever system we are running it on.

3. Recursive CTEs

SQL recursive is slightly different than most programming languages. In most programming languages, a recursive function is a routine that gets called directly or indirectly. Here's a python example.

# Function for nth Fibonacci number
def Fibonacci(n):
# Check if input is 0 then it will
 # print incorrect input
 if n < 0:
  print("Incorrect input")
# Check if n is 0
 # then it will return 0
 elif n == 0:
  return 0
# Check if n is 1,2
 # it will return 1
 elif n == 1 or n == 2:
  return 1
else:
  return Fibonacci(n-1) + Fibonacci(n-2)

As you can see, python continues until the n value becomes 0.

Similar to Python, SQL only has several layers. The first layer would be the "anchor member," and the second layer is the recursive member that references that anchor member.

Here is the SQL code.

WITH cte_name (column_list)
AS
(
-- Anchor member
initial_query
UNION ALL
-- Recursive member that references cte_name.
recursive_query
)
-- references cte name
SELECT *
FROM cte_name

And lastly, let's take a look at an example provided by the wonderful website GeeksforGeeks.com. First, we need to create the table and insert some values.

CREATE TABLE Employees
(
  EmployeeID int NOT NULL PRIMARY KEY,
  FirstName varchar(50) NOT NULL,
  LastName varchar(50) NOT NULL,
  ManagerID int NULL
)

INSERT INTO Employees VALUES (1, 'Ken', 'Thompson', NULL)
INSERT INTO Employees VALUES (2, 'Terri', 'Ryan', 1)
INSERT INTO Employees VALUES (3, 'Robert', 'Durello', 1)
INSERT INTO Employees VALUES (4, 'Rob', 'Bailey', 2)
INSERT INTO Employees VALUES (5, 'Kent', 'Erickson', 2)
INSERT INTO Employees VALUES (6, 'Bill', 'Goldberg', 3)
INSERT INTO Employees VALUES (7, 'Ryan', 'Miller', 3)
INSERT INTO Employees VALUES (8, 'Dane', 'Mark', 5)
INSERT INTO Employees VALUES (9, 'Charles', 'Matthew', 6)
INSERT INTO Employees VALUES (10, 'Michael', 'Jhonson', 6)

After the Employees table is created, we can set up the recursive CTE.

WITH
  cteReports (EmpID, FirstName, LastName, MgrID, EmpLevel)
  AS
  (
    SELECT EmployeeID, FirstName, LastName, coalesce(ManagerID,0), 1
    FROM Employees
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID, 
      r.EmpLevel + 1
    FROM Employees e
      INNER JOIN cteReports r
        ON e.ManagerID = r.EmpID
  )
SELECT
  FirstName + ' ' + LastName AS FullName, 
  EmpLevel,
  (SELECT FirstName + ' ' + LastName FROM Employees 
    WHERE EmployeeID = cteReports.MgrID) AS Manager
FROM cteReports 
ORDER BY EmpLevel, MgrID

The cteReports is made up of two sections. The first section, shown below, is the anchor member.

SELECT EmployeeID, FirstName, LastName, coalesce(ManagerID,0), 1
    FROM Employees
    WHERE ManagerID IS NULL

The value "1" at the end represents the employee's current level. The anchor member is then joined to the recursive member.

SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID, 
      r.EmpLevel + 1
    FROM Employees e
      INNER JOIN cteReports r
        ON e.ManagerID = r.EmpID

to get the final result

1|Ken|Thompson|0|1
2|Terri|Ryan|1|2
3|Robert|Durello|1|2
4|Rob|Bailey|2|3
5|Kent|Erickson|2|3
6|Bill|Goldberg|3|3
7|Ryan|Miller|3|3
8|Dane|Mark|5|4
9|Charles|Matthew|6|4
10|Michael|Jhonson|6|4

Final Thoughts

Implementing these CTEs can be a powerful tool and differentiate you as a SQL developer.

Thank you for reading. Let me know how you have used them, and as always, share the knowledge!

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

Data
Data Science
Technology
Recommended from ReadMedium