3 Great News for Google Cloud Data Engineers
How Google improves Data Integration lately

Google has been giving Data Engineers some joy in recent weeks, with many interesting updates.
Three important updates in particular should provide joy, especially the first update is particularly interesting.
Update 1: New Google Service Datastream
Datastream uses BigQuery’s Change Data Capture (CDC) functionality and Storage Write API to efficiently replicate updates directly from source systems in near real time [1].
This of course brings some advantages like near real-time insights in BigQuery, serverless ELT/ETL data pipelines that scales automatically, with no resources to provision or manage. Google Datastream also ensures source schemas change. Datastream seamlessly handles schema drift and automatically replicates new columns and tables added in the source to BigQuery. For now it’s available for: [1]
- MySQL
- PostgreSQL
- AlloyDB
- Oracle databases
Update 2: Easier Moving Data from S3 compatible Storage to Cloud Storage
Another handy feature when you need to move data from AWS to Google Cloud. It allows you to use Cloud Storage suitable for running applications written for the S3 API. With this new feature, customers can seamlessly copy data from self-managed object storage to Google Cloud Storage [2]. The new BigLake feature mentioned above may also be of interest here, as it allows you to run data analysis platform-independently and via SQL.
Update 3: Better Dataflow Benchmarks and Pub/Sub Monitoring
Google announced that they expanded support for PerfKit Benchmarker for benchmarking Dataflow pipelines. You can now test your Dataflow pipelines for performance optimization, capacity planning, regression testing and TCO estimation. While also for Pub/Sub monitoring dashboards are now part of the Pub/Sub UI in Google Cloud Console. Pub/Sub users can easily monitor the health of their real-time streaming applications by reading charts of insightful metrics [2].
Summary
So quite a lot of news for Data Engineers working in Google Cloud, which should make your work easier. I was especially happy about the possibility to implement ELT and CDC with Datastream to BigQuery more easily. This has always been a challenge and required other tools, now you can build on simple services integrated with BigQuery, which can save you time and money.
Sources and Further Readings
[1] Google, Datastream for BigQuery (2022)
[2] Google, What’s new with Google Cloud (2022)
