How new Updates make Google BigQuery a real Superior of Traditional Data Warehouses

New Feature makes BigQuery a real hybrid Data Lakehouse

A few years ago, SaaS-based Data Warehouses, especially of the big three Google BigQuery, AWS Redshift and Snowflake came out and changed a lot for Data Engineers and Scientists. These changes include:

no more fixed schemas
no more normalization, hence no more star and snowflake architectures
no more primary and foreign keys
other datatypes like structs and JSON, etc.

These changes of course come with some challenges: For example, you have to denormalize your data, put it into JSON or array data types which was a barrier if you had either star or snowflakes schema in your legacy system, and your users were used to them. But of course it also comes with a lot of benefits such as:

flexible schemas — you could work also with unstructured data
easy scalable
easy integration with AI services
easy data integration

The New Buzzword in Data Engineering: Zero ETL

What is Zero ETL — Definition, Benefits & Challenges

medium.com

So in the end, if you want a modern Data Warehouse which comes with a lot of benefits, you also have to overcome some barriers. From my own experience, I know it can be a hassle to get users to this new way of working with data structures and types. But since the last few months, Google BigQuery, in specific, has gradually overcome these barriers by adding the functionalities of classical and relational Data Warehouses or Databases.

Here are three features that Google has lately added that were previously missed by a lot of people, me included:

Feature 1: Primary and Foreign Keys

Google made Primary and Foreign Keys for BigQuery generally available

Using Primary and Foreign Key Table Constraints in BigQuery SQL

medium.com

Primary and Foreign keys are key elements for building star and snowflake schemas. This was not possible for quite a long time in BigQuery. Here, you had to use denormalization[1]. Now you can not enforce them but transfer this star or snowflake schema into your BigQuery Data Warehouses so they are identical to your source system or legacy Data Warehouse which could ease Business Intelligence tasks.

Feature 2: Cubes

Google BigQuery now supports Cubes

How Google adds new Grouping Functions like Grouping, Cubes and Rollup

medium.com

Another feature that was just added this week, was the ability to build cubes. This is a feature which is connected to the ability to use primary and foreign keys and also to aggregate data for all grouping set permutations. A grouping set is a collection of columns by which rows can be grouped together. This clause is helpful if you need to create a contingency table to find interrelationships between items in a set of data[2][3] — a very common use case for Business Intelligence.

Feature 3: Query Queues

Google launches Query Queues for BigQuery

How Google wants to improve Concurrency of their Data Warehouse

medium.com

Concurrency is an important criteria for a Data Warehouse, since many users are able to work simultaneously. While managing a few users is easy to manage, scaling to thousands is very hard to manage. Therefore, Google has rolled out query queues[4].

Summary

While newly founded companies would rather often go for the more easy-to-build-up SaaS Data Warehouse (or Data Lakehouse), companies with a legacy Data Warehouse have to overcome some barriers and have to implement data pipelines which transfer the data into new structures and to get users into this new way of working. Here, Google has made some new features available which could ease the integration process significantly. Now the trade off between new features and old traditions is significantly lower.

Sources and Further Readings

[1] Google, BigQuery release notes (2023)

[2] Google, Group rows by CUBE (2023)

[3] Wikipedia, Contingency table (2023)

[4] Google, Use query queues (2022)

Summarize

How new Updates make Google BigQuery a real Superior of Traditional Data Warehouses

New Feature makes BigQuery a real hybrid Data Lakehouse

The New Buzzword in Data Engineering: Zero ETL

What is Zero ETL — Definition, Benefits & Challenges

Feature 1: Primary and Foreign Keys

Google made Primary and Foreign Keys for BigQuery generally available

Using Primary and Foreign Key Table Constraints in BigQuery SQL

Feature 2: Cubes

Google BigQuery now supports Cubes

How Google adds new Grouping Functions like Grouping, Cubes and Rollup

Feature 3: Query Queues

Google launches Query Queues for BigQuery

How Google wants to improve Concurrency of their Data Warehouse

Summary

Sources and Further Readings