avatarChristianlauer

Summarize

Google launched CDC for BigQuery

How you can now process streamed Changes in real-time to existing Data using the BigQuery Storage Write API

Photo by Joao Branco on Unsplash

Big Update for Google’s Data Warehouse service BigQuery: CDC is now generally available. This is awesome for Data Engineers who have to integrate streaming data[1].

Change Data Capture (CDC) is a process that notes changes made to a database so that they can be tracked and replicated in real-time. CDC is commonly used in Data Integration, Data Warehousing, and Data Analytics to keep data in sync between different systems. One of the main benefits of CDC is that it allows organizations to make data-driven decisions based on the most up-to-date information[2].

The feature was before in preview, but now Google has announced that[1]:

BigQuery support for change data capture (CDC) by processing and applying streamed changes in real-time to existing data using the BigQuery Storage Write API is now generally available (GA).

With CDC, companies can have access to real-time data that helps them make well-informed decisions, respond to changes quickly, and improve their overall business performance.

CDC within BigQuery — Image Source: Google[3]

The BigQuery Storage Write API is a unified data ingest API for BigQuery that combines streaming ingest and batch loading into a single high-performance API.

The user is able to utilize the Storage Write API for streaming records into BigQuery in real-time, or even process an arbitrarily large number of records at the same time and committing them in a single atomic operation. For using BigQuery CDC, your data workflow and data scheme must meet the following conditions[4]:

  • You have to use the Storage Write API in the default stream.
  • You have to declare primary keys for the destination table in BigQuery. Composite primary keys that feature up to 16 columns are supported.

So a very useful new feature that Google is offering here. Sure, it was already in preview but now you can use it for a productive environment and be sure that the feature stays (that way). This feature comes with many other interesting updates that were made the last few weeks. Use the linked article for a deeper dive.

Sources and Further Readings

[1] Google, BigQuery release notes (2023)

[2] Wikipedia, Change data capture (2023)

[3] Google, Datenbankreplikation mit Change Data Capture (2023)

[3] Google, Stream table updates with change data capture (2023)

Data Science
Google
Bigquery
Technology
Programming
Recommended from ReadMedium