avatarMatt Chapman

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2377

Abstract

?</h1><p id="b4c8">If you’re interested in understanding <i>why</i> this works, keep reading.</p><p id="466e">All window functions, including the one above, follow the same basic pattern:</p><div id="3b60"><pre>WINDOW_FUNCTION(<span class="hljs-keyword">column</span>) <span class="hljs-keyword">OVER</span>( [<span class="hljs-keyword">PARTITION</span> <span class="hljs-keyword">BY</span> columns_to_group_by] [<span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> columns_to_order_by] )</pre></div><p id="5228">Let’s unpack this.</p><p id="d971">The first component of our window function is the <code>WINDOW_FUNCTION(column)</code> segment. The<code>WINDOW_FUNCTION</code> phrase is the analytic/aggregation function that we want to apply — e.g., <code>SUM</code>, <code>COUNT</code>, or <code>AVG</code> — and <code>column</code> refers to the column that we want to aggregate. In our example, we wrote <code>SUM(Views)</code>, which tells our database that we want to calculate the sum of values in the <code>Views</code> column.</p><p id="e1d6">The next component of our window function is the <code>OVER()</code> command, which is how we define the “window” for which we want to calculate the <code>SUM</code> of <code>Views</code>. For now, don’t worry too much about what goes inside the <code>OVER()</code> command. Just focus on the big picture: this command enables us to specify a subset of rows (i.e. a “window”) over which we want to make our calculation.</p><p id="89c7">To illustrate this, consider the following example. Let’s say we had another table <code>video_views_by_video</code> which is similar to one above, but includes an additional column, <code>Video_Title</code>, which specifies the video for which we are counting views:</p><figure id="3f57"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*oBYv9Jr48WORjtP2674ZeA.png"><figcaption>Image by author</figcaption></figure><p id="ecf4">If we want to calculate the cumulative views for each video, we will need to specify two separate groups of rows in our table. The first group needs to contain rows relating to the video “5 fun things to do in Amsterdam” (red), and the second group will consist of rows relating to the video “Top activities in Oxford” (blue).</p><p id="8a0d">The <code>OVER()</code> clause is what allows us to specify these groups.</p><p

Options

id="b40b">To “partition” our table into these separate windows, we would write <code>OVER(PARTITION BY Video_Title)</code>, which creates one partition for each unique value in the <code>Video_Title</code> column. Then, we could use the phrase<code>ORDER BY Date</code> to make sure that rows in each window are sorted according to the <code>Date</code> column.</p><p id="e389">Putting it all together, we’d write <code>SUM(Views) OVER(PARTITION BY Video_Title ORDER BY Date)</code>, which tells the database to return a new column in which we calculate the running total of video views for each different video, ordered by date. In a full query, this would be written as:</p><div id="5e13"><pre><span class="hljs-keyword">SELECT</span> <span class="hljs-type">Date</span>, Video_Title, Views, <span class="hljs-built_in">SUM</span>(Views) <span class="hljs-keyword">OVER</span>(<span class="hljs-keyword">PARTITION</span> <span class="hljs-keyword">BY</span> Video_Title <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> <span class="hljs-type">Date</span>) <span class="hljs-keyword">AS</span> Cumulative_Views <span class="hljs-keyword">FROM</span> video_views_by_video <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> Video_Title, <span class="hljs-type">Date</span></pre></div><p id="5be7">… which would return:</p><figure id="4792"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vTd8Fzds4kbsQPkIsGh1NA.png"><figcaption>Image by author</figcaption></figure><h1 id="1e7b">One final thing</h1><p id="9dca">It’s not compulsory to include both the <code>PARTITION BY</code> and <code>ORDER BY</code> clauses inside the <code>OVER()</code> command if you don’t need to. For example, in our original task of calculating the running total in the table <code>video_views</code>, we didn’t include the <code>PARTITION BY</code> clause. This was because we didn’t need to partition our table into different groups (there was only one video!) and we only needed to use the <code>ORDER BY</code> clause to make sure that the rows would be ordered according to date.</p><p id="5340">Thanks for reading. I hope this has been helpful!</p><p id="21cb"><b>Go from SELECT * to interview-worthy project. <a href="https://learning-sql.ck.page/aa03293372">Get our free 5-page guide</a>.</b></p></article></body>

A simple guide to calculating running totals in SQL

One line of code is all you need. Pinky promise.

The ability to quickly calculate a running total is a must-have for SQL users. In this article, I’ll give you a simple guide on how to do this using just one line of code.

Yep, just one.

I’ll start by introducing an example and showing you the query you need to calculate a running total. Then, I’ll explain what’s going on under the hood.

The problem

Imagine that we have a table called video_views which records the number of daily views received by a YouTube video. Each row represents a date, and shows the number of views that the video received on that date.

Image by author

If we want to see a running total of the number of views, we will need to create something like this, where each “cell” in the Cumulative_Views column shows the total number of views received up to that point:

Image by author

The solution

The recommended way to create a column like this in SQL is to use a window function, also known as an analytic function. Specifically, we can use this window function: SUM(Views) OVER(ORDER BY Date) AS Cumulative_Views. In a query, we’d simply add this in as a new column:

SELECT
  Date, 
  Views,
  SUM(Views) OVER(ORDER BY Date) AS Cumulative_Views
FROM video_views
ORDER BY Date

… and this would generate our new Cumulative_Views column, as pictured above. See, I told you it was only one line!

What’s going on here?

If you’re interested in understanding why this works, keep reading.

All window functions, including the one above, follow the same basic pattern:

WINDOW_FUNCTION(column) OVER( [PARTITION BY columns_to_group_by] [ORDER BY columns_to_order_by] )

Let’s unpack this.

The first component of our window function is the WINDOW_FUNCTION(column) segment. TheWINDOW_FUNCTION phrase is the analytic/aggregation function that we want to apply — e.g., SUM, COUNT, or AVG — and column refers to the column that we want to aggregate. In our example, we wrote SUM(Views), which tells our database that we want to calculate the sum of values in the Views column.

The next component of our window function is the OVER() command, which is how we define the “window” for which we want to calculate the SUM of Views. For now, don’t worry too much about what goes inside the OVER() command. Just focus on the big picture: this command enables us to specify a subset of rows (i.e. a “window”) over which we want to make our calculation.

To illustrate this, consider the following example. Let’s say we had another table video_views_by_video which is similar to one above, but includes an additional column, Video_Title, which specifies the video for which we are counting views:

Image by author

If we want to calculate the cumulative views for each video, we will need to specify two separate groups of rows in our table. The first group needs to contain rows relating to the video “5 fun things to do in Amsterdam” (red), and the second group will consist of rows relating to the video “Top activities in Oxford” (blue).

The OVER() clause is what allows us to specify these groups.

To “partition” our table into these separate windows, we would write OVER(PARTITION BY Video_Title), which creates one partition for each unique value in the Video_Title column. Then, we could use the phraseORDER BY Date to make sure that rows in each window are sorted according to the Date column.

Putting it all together, we’d write SUM(Views) OVER(PARTITION BY Video_Title ORDER BY Date), which tells the database to return a new column in which we calculate the running total of video views for each different video, ordered by date. In a full query, this would be written as:

SELECT
  Date,
  Video_Title,
  Views,
  SUM(Views) OVER(PARTITION BY Video_Title ORDER BY Date) AS Cumulative_Views
FROM video_views_by_video
ORDER BY Video_Title, Date

… which would return:

Image by author

One final thing

It’s not compulsory to include both the PARTITION BY and ORDER BY clauses inside the OVER() command if you don’t need to. For example, in our original task of calculating the running total in the table video_views, we didn’t include the PARTITION BY clause. This was because we didn’t need to partition our table into different groups (there was only one video!) and we only needed to use the ORDER BY clause to make sure that the rows would be ordered according to date.

Thanks for reading. I hope this has been helpful!

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

Sql
Data Analysis
Data Analytics
Bigquery
Data Science
Recommended from ReadMedium