The One SQL Function You Need to Ace the Amazon Interview Question: Highest-Grossing Items
How to Use ROW_NUMBER, DENSE_RANK, and RANK to Identify Top Sellers Quickly and Easily
Besides working on personal projects, a good idea is to practice programming skills and develop problem-solving skills through coding games.
Learning SQL, Python, and algorithms is essential for anyone who wants to pursue a career in Data engineering. There are a considerable amount of platforms that help practise and work on programming skills.
In this article, I will present the three most used Windows functions (ROW_NUMBER, DENSE_RANK, and RANK) and how I used one of them to solve the Amazon highest-grossing items question.
I used the Datalemur platform to practice SQL interview questions. I chose a medium-difficulty question for the company Amazon.
The problem was :
“Assume you’re given a table containing data on Amazon customers and their spending on products in different category, write a query to identify the top two highest-grossing products within each category in the year 2022. The output should include the category, product, and total spend.”
This problem is relatively simple, and the part that could be problematic is to find a way to get the top two highest per category based on the spend. It made me think immediately about ranking functions in SQL, and for me, it was the key to this problem.
There are three ranking functions in SQL: rank(), dense_rank(), and row_number().
Knowing which one to use came from the understanding of each of them:
- row_number(): The row_number() function always generates a unique ranking for every row, even if they have the same values;
SELECT id, name, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
+----+-----------+---------+----+
| id | name | salary | rank |
+----+-----------+---------+----+
| 1 | Alice | 1000 | 1 |
| 2 | Bob | 1000 | 1 |
| 3 | Charlie | 900 | 2 |
| 4 | David | 800 | 3 |
+----+-----------+---------+----+- Rank (): The rank() function will generate a ranking but assign the same rank to rows with the same values. The rank() function will create a gap. And the next different rank will not have the immediate following number.
SELECT id, name, salary, rank() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
+----+-----------+---------+----+
| id | name | salary | rank |
+----+-----------+---------+----+
| 1 | Alice | 1000 | 1 |
| 2 | Bob | 1000 | 1 |
| 3 | Charlie | 900 | 3 |
| 4 | David | 800 | 4 |
+----+-----------+---------+----+- dense_rank(): The dense_rank() is similar to rank() but will create a contiguous sequence of ranks. Given two rows ranking the same, the next ranking value will be increment by 1.
SELECT id, name, salary, dense_rank() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
+----+-----------+---------+----+
| id | name | salary | rank |
+----+-----------+---------+----+
| 1 | Alice | 1000 | 1 |
| 2 | Bob | 1000 | 1 |
| 3 | Charlie | 900 | 2 |
| 4 | David | 800 | 4 |
+----+-----------+---------+----+I used in this exercise the row_number() function. Here is my solution :
SELECT category, product, total_spend
FROM
(SELECT
p_s.category as category,
p_s.product as product,
SUM(p_s.spend) as total_spend,
row_number() over (PARTITION BY category order by SUM(p_s.spend) desc) row_number
FROM product_spend as p_s
WHERE
EXTRACT(YEAR FROM p_s.transaction_date)= '2022'
GROUP BY category, product) subquery
WHERE row_number<=2
ORDER BY category, row_number;This script can be improved by using CTE for more readability and performance.
WITH subquery AS (
SELECT
p_s.category AS category,
p_s.product AS product,
SUM(p_s.spend) AS total_spend,
row_number() OVER (PARTITION BY category ORDER BY SUM(p_s.spend) DESC) AS row_number
FROM product_spend AS p_s
WHERE
EXTRACT(YEAR FROM p_s.transaction_date) = '2022'
GROUP BY category, product
)
SELECT
category,
product,
total_spend
FROM subquery
WHERE row_number <= 2
ORDER BY category, row_number;Conclusion
Row_number, Dense_rank and rank are powerful SQL functions that can be used to solve various problems, including the Amazon interview question on highest-grossing items.
Here are some tips :
- Use the PARTITION BY clause to group the rows in the dataset by category.
- Use the ORDER BY clause to order the rows in each category by sales in descending order.
- Use the ROW_NUMBER function to rank each row in each category.
- Filter the results only to include the top two highest rankings.
I hope this article has been helpful.
Go from SELECT * to interview-worthy project. Get our free 5-page guide.





