Build a Data Lakehouse on Google Cloud
How to use Cloud Storage and BigQuery to build up a modern Data Platform

The Data Lakehouse is a modern approach to designing data platforms, it combines Data Lake and Data Warehouse — how you can build one in the Google Cloud you will learn in the following article.
With Google’s cloud services, you have the right tools at your disposal, but before we look at which services you can use, here’s a recap on the topic of Data Lakehouses.
Recap: The Data Lakehouse Concept
A Data Lakehouse is not just about integrating a Data Lake with a Data Warehouse, but rather integrating a Data Lake, a Data Warehouse, and purpose-built storage to enable unified governance and ease of data movement [1]. From my own experience, it has often shown that Data Lakes can be realized much faster. Once all data is available, Data Warehouses can still be built on top of it as a hybrid solution. Read more about it here.

How to build up a Data Lakehouse on GCP?
Let’s take a look at which Google Cloud Services you can use to build such a Data Lakehouse in the Google Cloud. Here, Cloud Storage and BigQuery are used as storage. Due to the good connectivity in the Google Cloud, the services can easily exchange data with each other and thus be used for analysis, machine learning and other topics.

So everything that goes in the direction of unstructured and semi-structured files, you can store well with cloud storage. Tables can be stored directly in BigQuery, although it must be said that BigQuery is now also a hybrid between SQL and NoSQL database and allows, for example, the storage as data type JSON.
Google takes it to the extreme with BigLake
Google even goes one step further and offers platform-independent data analysis via BigLake. So you can access storage like S3 and co. via cloud storage and on the basis of BigQuery Omni and perform BigQuery SQL analyses. This has the advantages that you don’t have to transport data and pay for duplicate storage and that even as an AWS or Azure user you can use the very cool data analytic tools from Google.
Summary
Google offers all the tools necessary to build a modern data platform and with Google BigLake even offers a bit more than others. But this architecture in general is of course also possible with other providers such as AWS or MS Azure. Microsoft for example offers with Azure Synapse Analytics such an analysis platform.
Sources and Further Readings
[1] AWS, What is a Lake House approach? (2021)
[2] Google Cloud, Open data lakehouse on Google Cloud (2021)
