The website content provides a comprehensive guide on summarizing data with SQL, focusing on pivot table creation in MySQL 8, and includes both theoretical explanations and practical examples.
Abstract
The article titled "How to Summarize Data With SQL" delves into the concept of pivot tables as a means to organize and summarize data by rows and columns, with intersecting cells displaying aggregate statistics such as sums or averages. It offers a step-by-step approach to creating pivot tables in MySQL 8, starting with basic SQL queries for average sales and profits, progressing to more complex queries involving segments, and addressing the challenges of pivoting data when MySQL lacks built-in pivot functions. The author presents a naive approach using CASE statements for each segment value, acknowledging its drawbacks in terms of maintainability and scalability. To improve upon this, a pseudo-dynamic approach is introduced, utilizing MySQL's GROUP_CONCAT() and CONCAT() functions to generate the necessary SQL code dynamically. This approach is detailed through the preparation of a macro, the construction of a complete query, and the final execution using prepared statements. The article also touches on the broader utility of prepared queries and invites readers to engage with the content and follow the author for future updates.
Opinions
The author suggests that creating pivot tables in MySQL 8 can be tricky due to the absence of dedicated pivot functions, unlike other databases like Snowflake or PostgreSQL.
The naive approach to pivoting data is criticized for being lengthy, error-prone, and difficult to maintain, especially when dealing with an unknown number of segment entries.
The pseudo-dynamic approach is presented as a superior alternative, leveraging MySQL functions to automate the generation of CASE statements, thus simplifying the process.
The article emphasizes the importance of prepared queries, noting that they are underutilized by analysts despite their potential benefits and common use by web developers.
The author encourages reader interaction and offers a free 5-page guide to help readers advance their SQL skills from basic queries to more advanced projects.
How to Summarize Data With SQL
Learn How to pivot Data. Theory and Practice Fully Explained!
A pivot table is a way to summarize data by organizing the values of one property in rows and the values of another property in columns.
At the intersection of a row and a column, there will be an aggregation statistic: the sum or the average.
If you want to run all the queries shown, download the sample database. All examples have been tested with MySQL 8.
credits: Mickaël Andrieu
If I asked you to list the average sales and profits for any orders table that has the properties “order_date”, “sales” and “profit” as fields, I imagine you would write the following query:
The query is still easy to write if we introduce a “segment” field and we want to obtain the statistics for each segment.
Indeed, we only need to group again on the segment field:
Pivot to MySQL 8: Naive approach
Where it gets a bit tricky is when you want to “pivot” this table so that you have one column per segment with the value of the different statistics, i.e., “avg_sales_consumer”, “avg_profit_home_office” etc.. directly accessible for each order date.
If other software like Snowflake or PostgreSQL has elegant functions to make a pivot, MySQL 8 has yet to have this kind of function.
To “rotate” rows and columns, we group on the order date and use the chosen aggregation function (here AVG()) and a CASE for each value of the column considered by the aggregation:
So, we already note the main drawback of this “strategy”: the more possible values there are for the segment, the more CASE(s) we will have to write 🤔
This is lengthy, error-prone, and quite painful to maintain: on the other hand, what happens if we have no idea how many entries are in the “segment” property?
Let’s try to do a little better 🤓
Pivot in MySQL 8: Pseudo-dynamic approach
MySQL has two functions to generate code: GROUP_CONCAT() and CONCAT(). The strategy consists in developing a macro (a bit like in dbt, in a video game, or even in iMacros!) which will repeat for us the CASE(s) from the list of possible values of the “segment” property.
The reasoning is as follows:
We prepare the macro, which expects a list of values: here, “Home Office”, “Consumer”, etc.
We store the query (and not the result of the query ⛔) in a SQL variable;
We prepare and execute the query;
Preparation of the macro
Since we already know which SQL query we need to generate, the first step is to use the GROUP_CONCAT(), and CONCAT() functions with the DISTINCT keyword so that we don’t iterate several times on the same segment value.
Be careful! MySQL limits the number of characters that can be concatenated using the GROUP_CONCAT() function, so you will have to adapt the value by updating the SESSION configuration:
As we can see, it is not the query that was executed (and so much the better, it is invalid since it is incomplete!) but the portion that corresponds to the necessary CASE(s).
It remains to build the complete query using our macro by reusing the CONCAT() function, which is decidedly handy 😜 :
Preparation and final execution of the request
At this point, all the complexity of the exercise is behind you! 🥳
To execute a prepared query, you need to use the PREPARE and EXECUTE statements by loading the contents of the “pivot_query” variable as you would for any other prepared query:
Prepared queries deserve their article as they can be helpful and seem unknown to most analysts, while web developers very much use them.
If you have any questions, feel free to comment, and if you enjoyed the article, follow me: you will receive a notification when I publish next.