avatarChristianlauer

Summary

Google Cloud has introduced a new feature that allows direct analysis of Google BigTable data using BigQuery SQL, eliminating the need for ETL processes.

Abstract

The recent update in Google Cloud services enables users to directly query and analyze data stored in Google BigTable using BigQuery SQL, which is particularly beneficial for handling large datasets for use cases like real-time fraud detection and personalization. This 'Zero — ETL' approach not only provides more up-to-date insights with increased data freshness but also reduces costs by avoiding duplicate data storage and minimizes the maintenance of ETL pipelines. The integration of BigQuery with BigTable aligns with Google's strategy to enhance BigQuery as a leading data warehouse solution, offering seamless analysis across various platforms and leveraging BigQuery ML for machine learning tasks.

Opinions

  • The author views the new feature as a significant advancement, emphasizing its advantages over traditional ETL methods.
  • The direct access to BigTable data with BigQuery SQL is seen as a cost-effective solution that provides real-time data analysis capabilities.
  • The article suggests that this update is part of Google's commitment to improving BigQuery's market position as a top data warehouse by integrating it with external cloud storage and enhancing data security.
  • The author believes that customers will benefit from this development through lower costs and better insights, which is unusual as new features often come with increased expenses.

Read Data directly with BigQuery SQL with the Zero — ETL Approach

No more ETL needed in between BigQuery and BigTable

Photo by Wil Stewart on Unsplash

With BigQuery, you can analyze data relatively easy using SQL and even create Machine Learning models using BigQuery ML. With BigLake you can now even analyze data across different platforms, so for example AWS and Azure. Now, Google introduces a new interesting feature regarding Google BigTable.

You can use BigTable for a wide range of use cases such as real time fraud detection, recommendations, personalization, etc. Now, you can access and analyze this data directly via BigQuery without ETL. This makes sense because with BigTable, companies often store huge amounts of data that you can now evaluate using SQL or use for machine learning in BigQuery ML [1].

While before you need ETL tools like Dataflow, talend or even self developed python tools to copy data from BigTable into BigQuery you can now query data directly with BigQuery SQL.

Architecture with BigTable and BigQuery within the Google Cloud — Source: Google [2]

Using the new approach you can overcome some shortcomings of the traditional ETL approach. Such as:

  • More data freshness (up-to-date insights for your business, no hours or even days old data).
  • Not paying twice for the storage of the same data (customers normally store Terabytes or even more in BigTable).
  • Less monitoring and maintaining of the ETL pipeline.

This approach follows the agenda to make BigQuery one of the best Data Warehouses on the market. Now, that external cloud storage can be easily integrated with BigQuery, this can also be easily realized with Google BigTable. And the best thing is that you as a customer have better insights at lower cost, often it is rather associated with more costs, but now you benefit as a customer twice as much.

If you using GCP and BigQuery more often like me, you may also be interested in these articles and new functions:

Sources and further Readings

[1] Google, Zero-ETL approach to analytics on Bigtable data using BigQuery (2022)

[2] Google, BigTable (2022)

Data Science
Technology
Google
Bigquery
Data
Recommended from ReadMedium