avatarSuraj Gurav

Summary

The provided content outlines essential SQL JOIN interview questions and their explanations, aimed at enhancing the reader's understanding and preparation for data science job interviews.

Abstract

The article "10 Ultimate SQL JOIN Questions and Answers For Data Science Interview" serves as a comprehensive guide for aspiring data scientists to master SQL JOIN concepts. It emphasizes the importance of SQL JOINs in data retrieval from multiple tables and provides insights into the types of JOINs, including INNER, LEFT, RIGHT, FULL, CROSS, and SELF JOINs. The author illustrates how to write SQL queries to join tables, explains the difference between Equi and Non-Equi JOINs, and clarifies the distinction between JOINs and UNIONs. The content is enriched with examples and encourages readers to understand the conceptual underpinnings of JOIN operations rather than just memorizing query syntax. The article aims to equip readers with the knowledge to confidently tackle SQL JOIN questions during interviews, drawing from the author's personal experience with FAANG interviews.

Opinions

  • The author believes that knowing only the basics of SQL is insufficient for data science roles; one must be able to apply SQL knowledge efficiently, especially in the context of JOINs.
  • The article suggests that interviewers favor SQL JOIN questions to assess a candidate's ability to handle real-world data scenarios where data is distributed across multiple tables.
  • It is the author's view that providing detailed explanations during interviews can demonstrate a candidate's depth of understanding of SQL concepts.
  • The author posits that understanding the theoretical aspects of SQL JOINs, such as Equi and Non-Equi JOINs, is crucial for solving complex data problems.
  • The author asserts that RIGHT JOINs are useful in certain scenarios, particularly when joining multiple tables, despite their apparent interchangeability with LEFT JOINs.
  • The article implies that mastering SQL JOINs is a key differentiator in data science interviews and can significantly impact a candidate's chances of success.
  • The author encourages readers to engage with the content actively, suggesting that readers can benefit from reading additional resources and practicing SQL queries to solidify their understanding.

Data Science

10 Ultimate SQL JOIN Questions and Answers For Data Science Interview

Know the most-frequently asked SQL JOIN interview questions and how to answer them!

Antwort Vektor erstellt von pch.vector de.freepik.com

Master SQL! 🏆

You need to prove your SQL skills for getting job into data science domain. However, knowing only basics of SQL is not enough, but you should be able to use your knowledge efficiently. 💯

In real world, companies usually store data into multiple tables rather than storing in a single big table. Therefore, the required data is always retrieved from multiple tables, using JOINs. That’s why in the job interview, SQL JOINs are the favorite topic of interviewers.

I listed here 10 most commonly asked interview questions on SQL JOINs and how you can answer them. I found these questions 80% of time in my data science interviews.

I kept this article short, but you can still jump to your favorite question using the index below.

· What is SQL JOIN and its use?
· What are different types of SQL JOINs?
· How will you write a SQL query to join two tables and get all the rows of the first table?
· What is CROSS JOIN in SQL?
· What is SELF JOIN in SQL and when do you use it?
· How do you JOIN more than two tables in SQL?
· What is Equi and Non-Equi JOIN?
· How many rows will you get using LEFT JOIN and INNER JOIN on two tables?
· Can you generate same results using LEFT and RIGHT JOINs?
· How do the UNION is different from JOIN in SQL?

Let’s get started! 🚀

Starting with the simplest one..

What is a SQL JOIN and its use?

The JOIN command in SQL is used to combine data from two or more tables based on the common columns between these tables.

When the data you need resides in different tables, SQL JOIN allows you to combine these tables in on one or more common columns. And then you can select the records, which may or may not have columns from both the tables.

What are the different types of SQL JOINs?

Based on how you would like to combine two or more tables and select records, there are 4 fundamental types of JOINs in SQL.

  • INNER JOIN: It combines two tables in such a way that the common columns on which join operation is performed has matching values in both the tables. Therefore it is used to retrieve records which are common in both the tables.
  • LEFT JOIN: This is used to retrieve all records from the left table, and the matching records from the right table
  • RIGHT JOIN: It is used to retrieve all records from the right table, and the matching records from the left table
  • FULL JOIN: As its name suggests, it is used to retrieve all the records from both the tables.

Here instead of stating only the types of JOINs, you should try to explain it in 1–2 sentences. This certainly gives interviewer an idea that, you know more about the joins. ✅

How will you write a SQL query to join two tables and get all the rows of the first table?

Here the interviewer essentially gives you two tables and usually they have one or more column names in common as shown below.

Dummy data tables | Image by Author

As you can see above, the OrderID is the common column in both tables, so the JOIN operation need to be performed on this column.

As question asks you to get all the rows of the first table, the simplest solution is to use LEFT JOIN as below.

SELECT t1.OrderID, 
       t1.orderDate, 
       t1.orderStatus,
       t2.salesManager,
       t2.salesUSD
FROM orders AS t1
LEFT JOIN sales AS t2
ON t1.OrderID = t2.OrderID
Get all rows of first table | Image by Author

Here interviewer checks how you mention individual column names in SELECT statement and whether you can use table aliases or not. ✅

What is a CROSS JOIN in SQL?

This returns all the possible combinations of two tables. So each row of the first table makes combination with each row of second table.

Ultimately, total number of records returned by CROSS JOIN is the multiplication of number of records in both the tables, if no WHERE clause is used.

The result generated from this type of JOIN is also called as Cartesian Product of two tables. ✅

For an instance, suppose you have two tables as below,

Dummy sales and orders data | Image by Author

And you want to find all the possible combinations of orders and sales and eventually select all the columns from both tables.

SELECT orders.*, sales.*
FROM orders 
CROSS JOIN sales

Unlike other types of joins, you do not need to mention ON clause in CROSS JOIN.

CROSS JOIN in SQL | Image by Author

You can notice, for each orderID in the orders table, there are two rows from sales table.

What is a SELF JOIN in SQL and when do you use it?

This is used to combine a table with itself. It might sound confusing if you are beginner, however, it is extremely useful for comparing rows within the same table.

Let’s try to understand this with an example.

Suppose, you have dataset as shown below containing names of students, their country and some other information. And you want to find out which students belong to same country.

Dummy students data | Image by Author

Clearly, you need to look for students and their countries in the same table. Let’s try to write a simple SELF JOIN as below —

SELECT t1.country, 
       t1.name AS student1,
       t2.name AS student2
FROM students AS t1
JOIN students AS t2
ON t1.studentID > t2.studentID
AND t1.country = t2.country

And the table is joined with itself on the studentID and country columns as —

Self JOIN in SQL | Image by Author

It quickly returns the pairs of students who come from same country. The JOIN condition here t1.studentID > t2.studentID ensures that you always compare two different students from the same table. This type of join is called Non-Equi Join. You will read more about it in the subsequent questions. ✅

And finally the condition t1.country = t2.country ensures that students belong to same country.

This example is the classic use-case of the 3 concepts — SELF JOIN, Equi and Non-Equi JOINs.

You can learn more about SELF JOIN in my one of the most-read article —

Also, you can learn more about Self JOINs in — Best SQL Examples To Understand Self Join.

How do you JOIN more than two tables in SQL?

Combining multiple tables together is exactly same as using JOIN between two tables.

This type of join requires consecutive JOIN operations → first, you join the first and second table to get an intermediate result set. Then you join another table to this intermediate table. ✅

Let’s see how this actually works using the example below.

Suppose you want to retrieve orderID, sales in USD, and quantity which are present in three different tables as shown below.

Dummy tables | Image by Author

And orderID is the common column between them. So, JOIN operation will be performed on this column as shown below.

SELECT orders.orderID,
       sales.salesUSD,
       products.quantity
FROM orders
INNER JOIN sales
        ON orders.orderID = sales.orderID
INNER JOIN products 
        ON orders.orderID = products.orderID

This gives only a single row output, as you used INNER JOIN and only orderID = A1234 is present in all the tables.

Join multiple tables | Image by Author

Likewise, you can use other types of JOINs as well!

What is an Equi and Non-Equi JOIN?

In Equi Join, the tables are combined based on matching values in common columns of both the tables. So, as its name suggests, it uses equality operator (=) in ON clause of SQL join.

Again consider the previous example,

SELECT orders.orderID,
       sales.salesUSD,
       products.quantity
FROM orders
INNER JOIN sales
        ON orders.orderID = sales.orderID
INNER JOIN products 
        ON orders.orderID = products.orderID

Look closely on how does the ON clause is defined → it uses equality = operator. This means you are retrieving all the records from tables where an orderID is present in all the tables.

However, the join condition need not be always equality i.e. you can use any other operator such as < , > ,,, <> in ON clause of SQL JOIN. And the type of join resulting from such non-equality operators is called as Non-Equi JOIN. ✅

A classic use case of Non-Equi joins is Rising Temperature problem from LeetCode, which I discussed in my article — Top 5 SQL Date Functions You Should Know

How many rows will you get using LEFT JOIN and INNER JOIN on two tables?

In such questions you will be usually provided with two tables with same or different size. This is the best question about conceptual understanding of different types of JOINs in SQL.

The interviewer doesn’t want you to write query and check number of rows. Instead, he wants to know how you can apply your knowledge and tell the number of rows only by looking the the tables. ✅

Let’s try to answer this using the example below.

Suppose you are given below two tables — orders and sales — and sometimes the queries as well.

Dummy tables | Image by Author
-- Query for INNER JOIN
SELECT orders.orderID,
       orders.orderStatus,
       orders.orderDate,
       sales.salesManager,
       sales.salesUSD
FROM orders
INNER JOIN sales 
        ON orders.orderID = sales.orderID
-- Query for LEFT JOIN
SELECT orders.orderID,
       orders.orderStatus,
       orders.orderDate,
       sales.salesManager,
       sales.salesUSD
FROM orders
LEFT JOIN sales 
        ON orders.orderID = sales.orderID

The ideal answer in this question would be — you first state the definition of INNER and LEFT JOIN and then give the answer. This gives the interviewer idea that how good you have conceptual understanding of the question and you are not randomly guessing the answer. ✅

As INNER JOIN is used to retrieve the records which have matching values in common columns orderID, this will result in only 1 row output as only orderID present in both the tables is A1234

whereas, LEFT JOIN is used to retrieve all the records of first table and matching records of second table, so this will return 3 rows output, with NULL in salesManager and salesUSD columns for orderIDs B1234 and C1234

Can you generate same results using LEFT and RIGHT JOINs?

The simple answer here is YES!!!

But, only one word answer is not expected in the interview. You are expected to tell more about it and explain with example.

Suppose you have the same tables as given in the example above, and you want to retrieve orderID and salesUSD columns for all the orderIDs in orders table.

So the first simple solution is using LEFT JOIN.

SELECT orders.orderID, 
       sales.salesUSD
FROM orders LEFT JOIN sales
ON orders.orderID = sales.orderID

which essentially generates output as,

LEFT JOIN | Image by Author

However, to get the same output using RIGHT JOIN, you need to exchange the positions of tables in JOIN clause as mentioned below. ✅

SELECT orders.orderID, 
  sales.salesUSD
FROM sales RIGHT JOIN orders
ON orders.orderID = sales.orderID

So ultimately, the query which uses RIGHT JOIN can be re-written as LEFT JOIN. 💯

This explanation leads to a quick follow up question —

Then why does the RIGHT JOIN even exist❓

RIGHT JOIN can be useful when you are joining more than two tables. Depending on the output you want, it can be easier to use combination of LEFT and RIGHT JOIN instead of using any one type of join.

For example, when you want to retrieve data from 3 tables and you have already used some type of JOIN with first 2 tables to get a intermediate table.

Now, you want all the records of third table, along with matching rows from intermediate table. Certainly you can’t put the 3rd table on the left side of LEFT JOIN in the series of JOINs. So, you have to use RIGHT JOIN. ✅

How do the UNION is different from JOIN in SQL?

Both UNION and JOIN allow you to combine the data from two tables, but there is fundamental difference between how it is done.

In case of JOIN, you can combine two tables horizontally based on a common column between them. And you can have different names for this common column in different tables.

For example, orderID from one table can be joined with column named order_ID or ID or whatever as long as values contained in these columns have same data type. ✅

Suppose you have two tables as below,

Dummy tables | Image by Author

To combine all columns of both the tables and retrieve only the rows which have matching value in common column orderID, all you need to do is —

SELECT orders.*, sales.*
FROM orders
INNER JOIN sales
ON orders.orderID = sales.orderID
How SQL JOIN works | Image by Author

As you can see, both the tables — orders and sales — are combined side by side.

Whereas, UNION is used to combine the results of two SELECT statements. It combines two resulting tables vertically one-below-another. So, logically both tables must have same column names.

So to combine two datasets using UNION, all the SELECT statements must have same name, number and order of columns. ✅

That’s all!

I hope you found this article super-useful to know and understand majority of interview questions about SQL JOINs. And certainly the best ways to answer them.

I gathered these questions through my personal interview experiences including interviews at FAANG. Giving the detailed explanation of answers during interview highlights your conceptual understanding of the concepts. 💯

Interested in reading unlimited stories on Medium??

💡 Consider Becoming a Medium Member to access unlimited stories on medium and daily interesting Medium digest. I will get a small portion of your fee and No additional cost to you.

💡 Be sure to Sign-up to my Email list to never miss another article on data science guides, tricks and tips, SQL and Python.

Thank you for reading!

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

Data Science
Programming
Technology
Sql
Interview
Recommended from ReadMedium