avatarChristianlauer

Summary

In 2023, Google BigQuery evolved with significant updates, including relational database features, cross-cloud analytics capabilities, and enhanced AI and data science tools, setting the stage for continued innovation in 2024.

Abstract

The year 2023 marked a transformative period for Google BigQuery, with the platform adopting more traditional relational database features such as primary and foreign keys, and the ability to build cubes. This shift allows for easier migration of companies' existing data warehouses into BigQuery. Cross-cloud analytics was another major update, with Google BigLake facilitating data analysis across different cloud platforms, including AWS and Azure. The introduction of materialized views and the integration of cloud storage services into the BigQuery Data Transfer Service have streamlined data transformation and integration processes. Furthermore, 2023 saw a significant enhancement in AI and data science capabilities within BigQuery, with the launch of Duet AI, a collaborative tool designed to assist with SQL queries, and the addition of new functions like Text Analyze and advanced mathematical operations. These updates are expected to continue shaping BigQuery's development into 2024, with a focus on cross-cloud analytics and AI integration, as well as the introduction of tools like Data Clean Rooms and BigQuery Studio to facilitate secure data sharing and streamline data analysis tasks.

Opinions

  • The author suggests that the addition of relational database features to BigQuery is a positive step for companies looking to migrate their traditional data warehouses.
  • There is an opinion that the cross-cloud analytics capabilities introduced in 2023, particularly Google BigLake, represent a significant advancement in how data can be analyzed across different cloud environments.
  • The author conveys enthusiasm about the integration of AI into BigQuery, highlighting Duet AI as a tool that will greatly benefit data scientists, analysts, and business users by simplifying SQL query generation and interpretation.
  • The expectation is that Google will continue to focus on cross-cloud analytics and AI integration in 2024, indicating a belief in the strategic importance of these areas for BigQuery's growth.
  • The author expresses that Data Clean Rooms and BigQuery Studio are valuable additions that enhance the platform's ability to handle secure data sharing and provide a more seamless experience for data analysis and science tasks.

How Google BigQuery becomes an even more powerful Data Lakehouse

Recap 2023: What were the major Updates and what can we expect in 2024?

Photo by Tobias Keller on Unsplash

2023 came with many new updates for Google BigQuery and new approaches like cross cloud analytics and Zero ETL. In the end, I also wanted to take a look into the features.

Let’s look into the most important updates and approaches of 2023 and how Data Engineers and Scientists could profit from that.

Feature 1: BigQuery became more relational

Sounds like a step backwards? Well maybe Cloud Data Warehouses like BigQuery, Snowflake and Redshift came in a column based and hybrid (between SQL and NoSQL) setup. So instead of building snowflake and star schema like a relational database OnPrem Data Warehouse, you would use nested data types to store data. But in 2023, Google has added primary and foreign keys together with the functionality to build cubes.

That made BigQuery a preferred tool for making migrations for companies easier which build Data Warehouse in traditional star or snowflake schema.

Feature 2: Introduction of Cross Cloud Analytics

Already in 2002, Google has introduced Google BigLake which allows easier Data Analysis on S3 and Azure Blob Storage. In 2023, Google has continued the approach with introducing more and more supporting features.

Google BigLake — Image Source: THENEXTPLATFORM[1]

Here, they added the availability to use materialized views over such data sources and allow data transformations and cleansing processes. Before you could query data directly but in Data Engineering and later Analysis processes, you often need to structure data. This can be a way to do so. But besides this direct approach, Google also added famous data sources like Azure and AWS cloud storage as a data source to the BigQuery Data Transfer Service. This is an easy solution to transport data and integrate it into BigQuery and not only query it ad hoc like with BigLake.

Feature 3: More AI and Data Science in BigQuery

2023 was the year of AI and topics like OpenAI, Chatbots and the war on AI between Microsoft and Google. No wonder that Google added many AI features for its data analytic flagship BigQuery.

Google has launched bigger features like Duet AI for BigQuery, which is an AI-powered collaborator in Google Cloud. It should help Data Scientists, Analysts and especially business users to complete, generate, and explain SQL queries[2]. But also smaller updates like new functions like the Text Analyze Function.

Also famous in Data Science are mathematical functions like Euclid and Cosine Distance function which let Data Science write their notebooks and AI models in simple BigQuerySQL.

What is 2024 bringing?

While Google has no official Roadmap for its Data Warehouse BigQuery and corresponding services, it is a little like a look into the crystal ball. But I strongly believe that the here presented highlights for 2023 will be also the main focus in 2024. Especially BigLake and Cross Cloud Analytics will probably be an approach that Google will follow to make their Data Analysis products interesting for companies which having many source systems in other cloud or using hybrid clouds.

The other area that Google will probably strengthen is the whole AI hype. Be it the integration of AI chatbots and models like Gemini into Google services like BigQuery to ease data integration and analysis or the integration of new function in BigQuery and BigQuery itself.

Other very useful new features and tools for me were definitely Data Clean Rooms which can ease the process of Data Sharing with internal and external customers and BigQuery Studio with which you can start in a programming notebook to validate and prep data, then open that notebook in other services, including Vertex AI, Google’s managed machine learning platform in order to continue their work with more specialized AI infrastructure and tooling[3].

Sources and Further Readings

[1] THENEXTPLATFORM, GOOGLE BIGLAKE STRETCHES BIGQUERY ACROSS ALL DATA (2024)

[2] Google, Write queries with Duet AI assistance (2023)

[3] Google, BigQuery Studio (2023)

Data Science
Google
Bigquery
Technology
Big Data
Recommended from ReadMedium