avatarChristianlauer

Summarize

How new Updates make Google BigQuery a real Superior of Traditional Data Warehouses

New Feature makes BigQuery a real hybrid Data Lakehouse

Photo by Adam Kool on Unsplash

A few years ago, SaaS-based Data Warehouses, especially of the big three Google BigQuery, AWS Redshift and Snowflake came out and changed a lot for Data Engineers and Scientists. These changes include:

  • no more fixed schemas
  • no more normalization, hence no more star and snowflake architectures
  • no more primary and foreign keys
  • other datatypes like structs and JSON, etc.

These changes of course come with some challenges: For example, you have to denormalize your data, put it into JSON or array data types which was a barrier if you had either star or snowflakes schema in your legacy system, and your users were used to them. But of course it also comes with a lot of benefits such as:

  • flexible schemas — you could work also with unstructured data
  • easy scalable
  • easy integration with AI services
  • easy data integration

So in the end, if you want a modern Data Warehouse which comes with a lot of benefits, you also have to overcome some barriers. From my own experience, I know it can be a hassle to get users to this new way of working with data structures and types. But since the last few months, Google BigQuery, in specific, has gradually overcome these barriers by adding the functionalities of classical and relational Data Warehouses or Databases.

Here are three features that Google has lately added that were previously missed by a lot of people, me included:

Feature 1: Primary and Foreign Keys

Primary and Foreign keys are key elements for building star and snowflake schemas. This was not possible for quite a long time in BigQuery. Here, you had to use denormalization[1]. Now you can not enforce them but transfer this star or snowflake schema into your BigQuery Data Warehouses so they are identical to your source system or legacy Data Warehouse which could ease Business Intelligence tasks.

Feature 2: Cubes

Another feature that was just added this week, was the ability to build cubes. This is a feature which is connected to the ability to use primary and foreign keys and also to aggregate data for all grouping set permutations. A grouping set is a collection of columns by which rows can be grouped together. This clause is helpful if you need to create a contingency table to find interrelationships between items in a set of data[2][3] — a very common use case for Business Intelligence.

Feature 3: Query Queues

Concurrency is an important criteria for a Data Warehouse, since many users are able to work simultaneously. While managing a few users is easy to manage, scaling to thousands is very hard to manage. Therefore, Google has rolled out query queues[4].

Summary

While newly founded companies would rather often go for the more easy-to-build-up SaaS Data Warehouse (or Data Lakehouse), companies with a legacy Data Warehouse have to overcome some barriers and have to implement data pipelines which transfer the data into new structures and to get users into this new way of working. Here, Google has made some new features available which could ease the integration process significantly. Now the trade off between new features and old traditions is significantly lower.

Sources and Further Readings

[1] Google, BigQuery release notes (2023)

[2] Google, Group rows by CUBE (2023)

[3] Wikipedia, Contingency table (2023)

[4] Google, Use query queues (2022)

Data Science
Google
Bigquery
Technology
Business
Recommended from ReadMedium