Cross Cloud Analytics as the End of the Data Warehouse?
Will new Query Engines and Data Lakehouse Approaches be the Feature?
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8TPWGYSGUogN05Wrzxo2ug.jpeg)
With the Zero ETL approach and new services like Google BigLake or the Open Lakehouse motto of Snowflake, a new era in Data Warehousing seems to come. Now, a new buzzword called Cross Cloud Analytics is gaining momentum.
Last year and probably this year, the approach of Zero ETL will be a hot topic in Data Engineering and Science and will be probably supported by new cross cloud services and tools.
This year, Google brings up a new hot buzzword: Cross Cloud Analytics — this is the querying of data with a query engine directly on other cloud providers and storage. Already in 2022, Google has presented us Google BigLake with which you can query for example S3 and Azure Blob Storage with BigQuery SQL.
The Zero ETL approach follows two distinct approaches:
- Querying the data directly on other data sources through a SQL Query Engine.
- Or using a data integration tool built in your Data Warehouse or Lakehouse and integrate the data without any data pipeline coding.
Whatever is more suitable for your use cases but also computing power and costs, they both ease data integration and transformation especially for Data Engineers. With Cross Cloud Analytics, especially the first approach will gain momentum. In this case, you let your data in different cloud instances, regions and even providers and use one query engine to query it. Before that, this was rather complicated, since you needed to implement some kind of business logic when querying data directly. In classical Data Warehouse you would integrate, clean and transform the data first and also care about security rules.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*a3AZC8ZGwE22B9qr.jpg)
But like so often in life of course Data Analytics Software has become better over the last months. To stay with the example of Google (who is also the most prominent representative of this approach) Google added a lot of new features like materialized views were you can implement business logic and Dataplex to remain control and governance over you data sources which can be distributed on many different cloud providers and storage types and locations.
For me, this new approach of so-called Cross Cloud Analytics makes total sense. Especially bigger companies will often have their data stored across many cloud providers and regions. Now, you don’t have to integrate this data twice when using for example Amazon S3 storage for many apps but using Google BigQuery for Data Analytics. This can result in less storage costs and in less Data Mess when having data stored in multiple locations.
Sources and Further Readings
[1] Google, BigQuery Omni: solving cross-cloud challenges by bringing analytics to your data (2023)