Data Lakehouse vs. Data Mesh
How to use the new Approaches to become more Data Driven

Data Warehouse, Data Lake and now Data Lakehouse and Data Mesh, what is what and where are the differences, especially the question of how do they relate to each other? Here, you will find out how they differ and how they can actually be build on each other.
The Data Lakehouse
The Data Lakehouse stores raw data in a Data Lake or Data Lakes divided into certain business contexts, while loading transfered and aggregated parts of it into the Data Warehouse for purposes like Self-Service BI, Data Marts or Machine Learning Services. The Data Lakehouse should combine the advantages of Data Lakes and Data Warehouses into a hybrid concept.

The two systems are not operated side by side, but as a novel single system. Benefits of a Data Lakehouse are [1]:
- Decoupling data storage from data processing to achieve better scalability.
- Open standardized storage formats and interfaces.
- Support for different data types, from unstructured to structured data.
- Support for various workloads, such as Data Science, Machine Learning, SQL, and Analytics.
- End-to-end streaming: streaming support eliminates the need for separate systems to serve real-time data applications.
- Shorter time-to-value compared to a Data Warehouse.
The Data Mesh Approach
It is important to understand that the Data Mesh concept primarily establishes a new organizational perspective and is less based on technical problem solving. Therefore, you should consider this four principles when building up a Data Mesh organization [2]:
- Domain-oriented decentralized data ownership and architecture: A Data Mesh should serve the individuals business units. Therefore, one or different Data Lakehouses could be build.
- Data as a product: The Data Lakehouse architecture helps to manage data as a product by providing different data team members in domain-specific teams complete control over the data lifecycle.
- Self-serve data infrastructure as a platform: Users can supply themselves with data in a self-service BI tool, while Data Scientists, for example, access the same data and develop models.
- Federated computational governance: The data should be backed up and distributed with a role concept. Data catalogs are also helpful here, for example.

Bring it all Together
So as you can see, it’s less about Data Lakehouse vs. Data Mesh then combine a Data Lake and a Data Warehouse as a Data Lakehouse and using the organizational approach of a Data Mesh to gover, manage and distribute the data in the company. In the architectural diagram above, you see the technical components like data, code and infrastructure which can be seen as the Data Lakehouse while the upper part the catalog, security, logging and so on are used to bring the right data to the right person — the Data Mesh.
Summary
I hope this has helped you to get at least a first understanding of what Data Lakehouse and Data Mesh are and how they can be combined. If you want to learn more click here:
Sources and Further Readings
[1] Stefan Koch, Können Lakehouses einen Paradigmenwechsel anstossen? (2021)
[2] Michael Armbrust, Ali Ghodsi, Bharath Gowda, Arsalan Tavakoli-Shiraji, Reynold Xin and Matei Zaharia, Frequently Asked Questions About the Data Lakehouse (2021)
[3] Google, Build a data mesh on Google Cloud with Dataplex, now generally available (2022)






