avatarChristopher Tao

Summary

The article discusses the nuances of using filtering conditions in the ON clause of LEFT JOIN and RIGHT JOIN operations in SQL, as opposed to the common practice of placing them in the WHERE clause.

Abstract

The article delves into the SQL LEFT JOIN and RIGHT JOIN operations, emphasizing the importance of understanding where to place filtering conditions. It explains that placing conditions in the ON clause, as opposed to the WHERE clause, can lead to different outcomes. Specifically, when using LEFT JOIN, all rows from the left table are retained, and the right table is filtered based on the joining conditions. The article illustrates this with examples, showing that conditions in the ON clause do not filter out rows from the result set but determine which rows from the right table are joined. This is crucial in scenarios where all records from one table must be preserved, such as in a campaign sales report where one might want to list all campaigns regardless of whether there were associated sales. The article concludes with a practical use case and a summary of the key distinction between using conditions in the ON clause versus the WHERE clause.

Opinions

  • The author suggests that it is uncommon to place conditions in the ON clause, but it can be useful in specific scenarios.
  • The article implies that understanding the behavior of LEFT JOIN and RIGHT JOIN with conditions is essential for accurate SQL querying.
  • It is the author's view that using conditions in the ON clause is a valuable technique for retaining all records from one table, which might otherwise be excluded if the conditions were placed in the WHERE clause.
  • The author believes that the distinction between placing conditions in the ON clause versus the WHERE clause is a "small syntax pitfall" that can significantly impact the results of a query.
  • The author emphasizes the practicality of using conditions in the ON clause by providing a real-world example of a campaign sales report, highlighting the importance of seeing all campaigns, even those without sales.
Image by NoName_13 from Pixabay

Do Not Put Filtering Conditions In the “WHERE” Clause in Some Scenarios

Do you what happens if we put conditions after LEFT JOIN ON?

As long as you click into this article, I’m pretty sure you should understand SQL. You must also understand that we should put our conditions in the WHERE clause in a SELECT query. However, let me ask you a question and see if you can answer it immediately.

What will happen if we put a filter condition with LEFT JOIN … ON … clause?

SELECT *
FROM Employee e LEFT JOIN Department d
  ON e.dept_id = d.id
  AND e.name = 'Chris'

If you are unsure about the behaviour of the above query or think it is equivalent to the following one, please read my article and I’ll tell you why they are different.

SELECT *
FROM Employee e LEFT JOIN Department d
  ON e.dept_id = d.id
WHERE e.name = 'Chris'

1. Verify the Results

Image by NoName_13 from Pixabay

For demonstration purposes, I have created two tables with simple dummy data as follows.

The Employee Table

The Department Table

Now, let’s run the first query in the introduction. For your reference, I’ll put the query here again.

SELECT *
FROM Employee e LEFT JOIN Department d
  ON e.dept_id = d.id
  AND e.name = 'Chris'

That was the answer. Were you right? :)

Just for comparison purposes, I’ll also put the 2nd query and its results here. Of course, there is nothing to be surprised about, the filter condition will filter the results eventually.

SELECT *
FROM Employee e LEFT JOIN Department d
  ON e.dept_id = d.id
WHERE e.name = 'Chris';

Why that’s the case? Let’s have a deep dive in the next section.

2. The Logic of Joins

Image by Gerd Altmann from Pixabay

Before we can understand why the behaviour is like that, we need to review what are LEFT JOIN and RIGHT JOIN exactly and to make sure we fully understand it.

Let me show you what RIGHT JOIN with condition looks like. The same query as above but using RIGHT JOIN leads to the following results.

SELECT *
FROM Employee e RIGHT JOIN Department d
  ON e.dept_id = d.id
  AND e.name = 'Chris'

Now, let’s have a look at the workflow so we will understand why that happens.

LEFT JOIN with Conditions

The JOIN action in SQL is used to combine two tables, the left table and the right table. When we use LEFT JOIN, all the rows in the left table will be guaranteed to be retained. Only the right table will be filtered based on the joining conditions.

The following example is for the scenario that we only have e.dept_id = d.id as the joining condition.

SELECT *
FROM Employee e LEFT JOIN Department d
  ON e.dept_id = d.id

In the above query, we can think it like this. The left table Employee is written on the paper already, then we look at it row by row. For each row on the left table, we will loop the right table Department row by row, to see which row matches the condition. If it matches, we will write down the row on the right following the row on the left.

For example, when we look at “Alice”, the dept_id=1. The first row on the right “Sales” department has the same id. So, this row will be written following the “Alice” row on the left.

Now, let’s have a look at the query with the extra condition e.name = 'Chris'.

SELECT *
FROM Employee e LEFT JOIN Department d
  ON e.dept_id = d.id
  AND e.name = 'Chris'

I still use the same arrows. However, this time the dashed lines indicate that only the first condition e.dept_id = d.id is met but not the second condition e.name = 'Chris'. For example, the figure below shows that although both the IDs are 1, the e.name is Alice, not Chris.

Of course, the other two rows on the right cannot satisfy the two conditions either. Therefore:

The row on the left will be retained and no row on the right will be selected to join with it.

The only matched pair in this example is as follows.

Therefore, we will see that only the row with “Chris” has a valid joined row from the right table. Because the id and name columns from the right tables are needed, all the other rows will show null values to fill the gaps in these two columns.

RIGHT JOIN with Conditions

Now, let’s have a look at the RIGHT JOIN scenario. It will be exactly the opposite. The right table will be guaranteed to be retained, and it will try to find the matched rows from the left table based on the conditions.

Therefore, the RIGHT JOIN with the above condition will be as follows.

Since the only row on the left that satisfies both conditions is the “Chris” one, the result will be as follows.

3. When to Use It?

Image by Rudy and Peter Skitterians from Pixabay

Basically, it will be rare cases that we want to put conditions in the ON clause because it doesn’t actually filter the results. However, that also means we can use it when we want to retain all the records from the left table (or right table) regardless of the conditions.

A Practical Use Case: Campaign Sales Report

Suppose we have two tables: Campaigns and Sales.

The Campaigns table has all the campaign activities and each campaign only has one product. Of course, each campaign also has start and end dates.

CREATE TABLE Campaigns (
    campaign_id INT PRIMARY KEY,
    product_id INT,
    start_date DATE,
    end_date DATE
);

The Sales table has all the transactions. Each transaction has one or more products and along with a sale date.

CREATE TABLE Sales (
    sale_id INT,
    product_id INT,
    quantity INT,
    sale_date DATE,
    PRIMARY KEY (sale_id, product_id)
);

Now, the problem we need to resolve is to list all the campaign activities, as well as the number of products sold during these activities. Also, we don’t want to miss any campaigns because even if no product was sold during the activities, we still want to know it.

The SQL query is as follows. We can put the sale date filtering conditions into the LEFT JOIN … ON clause. This makes sure that the joining is valid, as well as we don’t lose any campaigns even if there was no product sold.

SELECT C.campaign_id,
       C.product_id,
       C.start_date, 
       C.end_date,
       SUM(S.quantity) AS TotalCampaignSales
FROM Campaigns C LEFT JOIN Sales S 
  ON C.product_id = S.product_id 
  AND S.sale_date >= C.start_date 
  AND S.sale_date <= C.end_date 
GROUP BY C.campaign_id, 
         C.product_id, 
         C.start_date, 
         C.end_date  
ORDER BY SUM(S.quantity) DESC;

The result shows that the campaign #1 is the most successful one, while the campaign #4 was not effective at all.

What if we put the conditions in a WHERE clause?

The result is still valid, but we won’t be able to see the campaign #4.

SELECT C.campaign_id,
       C.product_id,
       C.start_date, 
       C.end_date,
       SUM(S.quantity) AS TotalCampaignSales
FROM Campaigns C LEFT JOIN Sales S 
  ON C.product_id = S.product_id 
WHERE S.sale_date >= C.start_date 
  AND S.sale_date <= C.end_date 
GROUP BY C.campaign_id, 
         C.product_id, 
         C.start_date, 
         C.end_date  
ORDER BY SUM(S.quantity) DESC;

So, we won’t be able to see campaign #4 at all, because there was no product sold during the campaign. Suppose we have more columns in the Campaigns table such as the campaign budget, it will be even more useful if we can return all the campaigns. For example, how much budget have we wasted on campaigns that were not sold with any products?

Summary

Image by SplitShire from Pixabay

In this article, I have focused on a very small syntax pitfall in SQL, which is the conditions in the LEFT/RIGHT JOIN … ON clause. Normally we will put the filter conditions in the WHERE clause. However, the conditions in the ON clause will not reduce the number of rows, but it affect if the table on the other side will be joined or not. If the conditions are not met, they will be left as null values.

I have explained this behaviour with my best efforts. Additionally, the practical use case was also given. Hope it will be helpful.

Unless otherwise noted all images are by the author

Sql
Data Science
Technology
Programming
Artificial Intelligence
Recommended from ReadMedium