avatarChristianlauer

Summary

Google BigQuery has introduced support for cubes, including new grouping functions like GROUP BY GROUPING SETS, GROUP BY CUBE, and GROUP BY ROLLUP, enhancing its relational data warehouse capabilities.

Abstract

Google's BigQuery has expanded its data warehousing features by adding support for cubes and new grouping functions. This move represents a shift towards incorporating more traditional relational data warehouse functionalities, such as star and snowflake schemas, into BigQuery's previously column and NoSQL-focused system. The new features include the GROUP BY GROUPING SETS clause for aggregating data across multiple grouping sets, the GROUP BY CUBE clause for generating aggregated data for all possible combinations of grouping sets, and the updated GROUP BY ROLLUP clause for including hierarchical summaries. These additions, along with the recently added primary and foreign keys, are designed to facilitate data analysis and ease the transition for users accustomed to relational data warehouses. The article provides a code example demonstrating the use of the GROUP BY CUBE clause and suggests that these enhancements will be beneficial for data analysts and business intelligence users.

Opinions

  • The author believes that the addition of cubes and new grouping functions in BigQuery is particularly interesting for data analysts and scientists.
  • The author suggests that the new features, especially the GROUP BY CUBE clause, are useful for creating contingency tables and finding interrelationships between data items.
  • The author implies that Google, along with other data warehouse solutions like Amazon Redshift and Snowflake, is increasingly offering relational data warehouse features to simplify migrations.
  • The author endorses the new BigQuery updates as potentially easing migrations and providing functions that users might miss from traditional relational data warehouses and business intelligence solutions.
  • The author recommends trying out an AI service that offers similar performance to ChatGPT Plus (GPT-4) at a more cost-effective price, indicating a belief in the value of such tools for users interested in these updates.

Google BigQuery now supports Cubes

How Google adds new Grouping Functions like Grouping, Cubes and Rollup

Photo by Jonathan Petersson on Unsplash

Google’s BigQuery like other SaaS-Data Warehouse has previously offered scalable features with column and NoSQL-based functionalities, but cut typical relational Data Warehouse features like Star and Snowflake schemas and cubes to do so. This is now increasingly changing.

Like the also lately added primary and foreign keys for BigQuery, Google now supports additional ways to work with grouping sets, which include[1]:

While the ROLLUP clause was updated, Google added the GROUP BY GROUPING SETS clause, GROUPING function and GROUP BY CUBE clause completely new. All have their use cases but especially the ability to produce analytic cubes in BigQuery is the most interesting for me and I think for many Data Analysts and Scientists, too.

While column-based Data Warehouses actually tried to fit data into nested data structures like arrays, JSON or Structs due to performance and query costs, it seems that now Google but also other solutions like Amazon Redshift and Snowflake try to offer more and more relational Data Warehouse features to make migrations even more easy.

By using the GROUP BY CUBE clause for example, you can aggregate data for all grouping set permutations. A grouping set is a collection of columns by which rows can be grouped together. This clause is helpful if you need to create a contingency table to find interrelationships between items in a set of data[2][3].

Here is a little example for creating a Cube within BigQuery[2]:

-- GROUP BY with CUBE
WITH
  Products AS (
    SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
    SELECT 'shirt', 't-shirt', 8 UNION ALL
    SELECT 'shirt', 'polo', 25 UNION ALL
    SELECT 'pants', 'jeans', 6
  )
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY CUBE (product_type, product_name)
ORDER BY product_type, product_name;

/*--------------+--------------+-------------+
 | product_type | product_name | product_sum |
 +--------------+--------------+-------------+
 | NULL         | NULL         | 42          |
 | NULL         | jeans        | 6           |
 | NULL         | polo         | 25          |
 | NULL         | t-shirt      | 11          |
 | pants        | NULL         | 6           |
 | pants        | jeans        | 6           |
 | shirt        | NULL         | 36          |
 | shirt        | polo         | 25          |
 | shirt        | t-shirt      | 11          |
 +--------------+--------------+-------------*/

So pretty useful functions that Google has added here for BigQuery. Please use the links below for a deeper dive and more code examples. These new updates could definitely help to ease migrations and to offer Data Analyst and Business Intelligence users functions that they may miss when working with relational and classical Data Warehouses and BI solutions before.

Sources and Further Readings

[1] Google, BigQuery release notes (2023)

[2] Google, Group rows by CUBE (2023)

[3] Wikipedia, Contingency table (2023)

Data Science
Technology
Programming
Business Intelligence
Google
Recommended from ReadMedium