Google BigQuery now supports Cubes
How Google adds new Grouping Functions like Grouping, Cubes and Rollup
Google’s BigQuery like other SaaS-Data Warehouse has previously offered scalable features with column and NoSQL-based functionalities, but cut typical relational Data Warehouse features like Star and Snowflake schemas and cubes to do so. This is now increasingly changing.
Like the also lately added primary and foreign keys for BigQuery, Google now supports additional ways to work with grouping sets, which include[1]:
GROUP BY GROUPING S
ETS clause (new): Produce aggregated data for one or more grouping sets.GROUP BY C
UBE clause (new): Produce aggregated data for all grouping set permutations.GROUP BY ROL
LUP clause (update): Include groupable item sets in this clause.GROUP
ING function (new): Check if a groupable value in theGROUP BY
clause is aggregated.
While the ROLLUP
clause was updated, Google added the GROUP BY GROUPING S
ETS clause, GROUP
ING function and GROUP BY C
UBE clause completely new. All have their use cases but especially the ability to produce analytic cubes in BigQuery is the most interesting for me and I think for many Data Analysts and Scientists, too.
While column-based Data Warehouses actually tried to fit data into nested data structures like arrays, JSON or Structs due to performance and query costs, it seems that now Google but also other solutions like Amazon Redshift and Snowflake try to offer more and more relational Data Warehouse features to make migrations even more easy.
By using the GROUP BY CUBE
clause for example, you can aggregate data for all grouping set permutations. A grouping set is a collection of columns by which rows can be grouped together. This clause is helpful if you need to create a contingency table to find interrelationships between items in a set of data[2][3].
Here is a little example for creating a Cube within BigQuery[2]:
-- GROUP BY with CUBE
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY CUBE (product_type, product_name)
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| NULL | jeans | 6 |
| NULL | polo | 25 |
| NULL | t-shirt | 11 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
So pretty useful functions that Google has added here for BigQuery. Please use the links below for a deeper dive and more code examples. These new updates could definitely help to ease migrations and to offer Data Analyst and Business Intelligence users functions that they may miss when working with relational and classical Data Warehouses and BI solutions before.
Sources and Further Readings
[1] Google, BigQuery release notes (2023)
[2] Google, Group rows by C
UBE (2023)
[3] Wikipedia, Contingency table (2023)