A Simple Guide to Common Table Expressions
Creating cleaner and simpler SQL code
A common table expression (CTE) is a temporary result set that stores intermediate results. They are an incredibly powerful tool that SQL developers can use to solve complex problems and simplify complicated queries.
These queries make it possible for developers to work with data more efficiently and effectively. CTEs are also helpful when performing multiple operations on the same set of data.
In this article, you will learn what these common table expressions are, how to create them, and the advantages of using CTEs.
What are common table expressions?
Common table expressions (CTEs) store data in a single table using the WITH keyword. The data is stored as columns and rows, and the column names can be different from the table name.
A CTE is a simple way of creating a temporary table that can be joined within an SQL query. CTEs are often used for more complex queries, need to be efficient with repetition, or need the ability to use JOINs, UNIONs, aggregating functions, or subqueries.
How to write excellent CTE queries
The first step is to figure out how to write these beautiful queries.
I will be using the sample DVD rental database found here; you will find both the schema and the sample data.
- The first section is to start the first CTE with "With"
with customers_cte
as
(
select * from [Customers]
where City = "Lyon"
) 2. Add more CTEs using an "AS" clause.
with customers_cte
as
(
select * from [Customers]
where City = 'Lyon'
),
orders_cte as
(
SELECT * from [Orders] where YEAR(OrderDate) = 1996
) In this example, I only used two CTEs, but I could have added more if I wanted to.
3. The CTE must be followed by a single select, insert, update, or delete statement. Here's the final result.
with customers_cte
as
(
select * from [Customers]
where City = 'Lyon'
),
orders_cte as
(
SELECT * from [Orders] where YEAR(OrderDate) = 1996
)
select * from customers_cte
join orders_cte on customers_cte.CustomerID = orders_cte.CustomerID;Now that you know how to write a CTE let's look at the advantages of these CTEs.
Advantages of using CTEs
- Simplicity and readability
We have all seen excellent SQL queries that are difficult to follow because they have unions, joins, and everything in between. Adding CTEs makes these SQL queries a lot more readable. Let's look at an example highlighting how simple a CTE can make a query.
SELECT * FROM [Orders] o join [Customers] c on c.CustomerID = o.CustomerID where c.City = "Lyon";to
with customers_cte as
(
select * from [Customers]
where c.City = "Lyon")
select * from [Orders] o
join customers_cte c
on c.CustomerID = o.CustomerIDSeparating the joins allows me to quickly see the customers' table filtering. The longer these joins become, the easier it will be when you separate them.
2. Filtering large datasets before a join
Using common table expressions, you can filter the tables before joining. This change might not be as significant when dealing with smaller datasets. However, when the datasets get much larger, eliminating the number of joined rows can limit the number of resources needed.
Let's take a look at the example we have been using.
with customers_cte as
(
select * from [Customers]
where c.City = "Lyon"
)
select * from [Orders] o
join customers_cte c
on c.CustomerID = o.CustomerIDIn the customers_cte, filtering by "Lyon" limits the number of rows to just 2. If this dataset had 100,000 rows and filtered down to 2 rows, running this query has become a lot more manageable for whatever system we are running it on.
3. Recursive CTEs
SQL recursive is slightly different than most programming languages. In most programming languages, a recursive function is a routine that gets called directly or indirectly. Here's a python example.
# Function for nth Fibonacci number
def Fibonacci(n):# Check if input is 0 then it will
# print incorrect input
if n < 0:
print("Incorrect input")# Check if n is 0
# then it will return 0
elif n == 0:
return 0# Check if n is 1,2
# it will return 1
elif n == 1 or n == 2:
return 1else:
return Fibonacci(n-1) + Fibonacci(n-2)As you can see, python continues until the n value becomes 0.
Similar to Python, SQL only has several layers. The first layer would be the "anchor member," and the second layer is the recursive member that references that anchor member.
Here is the SQL code.
WITH cte_name (column_list)AS(
-- Anchor memberinitial_queryUNION ALL-- Recursive member that references cte_name.recursive_query)
-- references cte nameSELECT *FROM cte_nameAnd lastly, let's take a look at an example provided by the wonderful website GeeksforGeeks.com. First, we need to create the table and insert some values.
CREATE TABLE Employees
(
EmployeeID int NOT NULL PRIMARY KEY,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL,
ManagerID int NULL
)
INSERT INTO Employees VALUES (1, 'Ken', 'Thompson', NULL)
INSERT INTO Employees VALUES (2, 'Terri', 'Ryan', 1)
INSERT INTO Employees VALUES (3, 'Robert', 'Durello', 1)
INSERT INTO Employees VALUES (4, 'Rob', 'Bailey', 2)
INSERT INTO Employees VALUES (5, 'Kent', 'Erickson', 2)
INSERT INTO Employees VALUES (6, 'Bill', 'Goldberg', 3)
INSERT INTO Employees VALUES (7, 'Ryan', 'Miller', 3)
INSERT INTO Employees VALUES (8, 'Dane', 'Mark', 5)
INSERT INTO Employees VALUES (9, 'Charles', 'Matthew', 6)
INSERT INTO Employees VALUES (10, 'Michael', 'Jhonson', 6)After the Employees table is created, we can set up the recursive CTE.
WITH
cteReports (EmpID, FirstName, LastName, MgrID, EmpLevel)
AS
(
SELECT EmployeeID, FirstName, LastName, coalesce(ManagerID,0), 1
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID,
r.EmpLevel + 1
FROM Employees e
INNER JOIN cteReports r
ON e.ManagerID = r.EmpID
)
SELECT
FirstName + ' ' + LastName AS FullName,
EmpLevel,
(SELECT FirstName + ' ' + LastName FROM Employees
WHERE EmployeeID = cteReports.MgrID) AS Manager
FROM cteReports
ORDER BY EmpLevel, MgrIDThe cteReports is made up of two sections. The first section, shown below, is the anchor member.
SELECT EmployeeID, FirstName, LastName, coalesce(ManagerID,0), 1
FROM Employees
WHERE ManagerID IS NULLThe value "1" at the end represents the employee's current level. The anchor member is then joined to the recursive member.
SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID,
r.EmpLevel + 1
FROM Employees e
INNER JOIN cteReports r
ON e.ManagerID = r.EmpIDto get the final result
1|Ken|Thompson|0|1
2|Terri|Ryan|1|2
3|Robert|Durello|1|2
4|Rob|Bailey|2|3
5|Kent|Erickson|2|3
6|Bill|Goldberg|3|3
7|Ryan|Miller|3|3
8|Dane|Mark|5|4
9|Charles|Matthew|6|4
10|Michael|Jhonson|6|4Final Thoughts
Implementing these CTEs can be a powerful tool and differentiate you as a SQL developer.
Thank you for reading. Let me know how you have used them, and as always, share the knowledge!
Go from SELECT * to interview-worthy project. Get our free 5-page guide.






