Data Warehouse vs. Data Lakehouse
What are the Differences and when to use what?

Data Warehouses are now established in almost all companies, but recently you hear more and more about Data Lakehouses — what is what, where are the differences and how can both be combined?
So let’s look into both of the technologies and define what is what first to then think about how the two can be even combined.
The Data Warehouse
The Data Warehouse has been established in companies for a long time, it is intended as a source for all kinds of data analysis and can be extended by (Self-Service) Business Intelligence, often based on OLAP for Data Analysts but also normal employees. Often classical ETL processes with mostly structured data are performed with schema on write based on SQL technologies. Famous classical Data Warehouses often were developed on premise and technologies like MSSQL, MySQL or Oracle databases were used. Modern and frequently used cloud based systems are Google Big Query, Hadoop or Amazon Redshift.

The Data Lakehouse
The Data Lakehouses combines the Data Lake with a Data Warehouse to enable unified governance and ease of data movement [2]. My own experience has often shown that a Data Lake can be realized much faster. Once all data is available, Data Warehouses can still be built on top of it as a hybrid solution.

This makes rigid and classically planned Data Warehouses a thing of the past and also accelerates the provision of dashboards and analyses. Hence, it is a good step towards a data-driven culture. An implementation with new SaaS-services from the cloud and approaches such as ELT instead of ETL also accelerates the development.
An example is a Google Cloud based Data Lakehouse where you use cloud storage for your Data Lake and BigQuery for you Data Warehouse — here it’s important to mention that also BigQuery fulfills some characteristics of an Data Lake, so it’s somehow a hybrid solution.
Data Lakehouse as a Replacement for the Data Warehouse?
The answer: Not necessarily. There are certainly cases in companies that only process a small amount of data, often structured data, and thus continue to operate their classic Data Warehouse, but here too it may make sense to switch to a modern cloud Data Warehouse for reasons of performance, less maintenance effort and possibly also costs. However, it must also be said here that these modern cloud-based Data Warehouses such as Google BigQuery, AWS Redshift or Snowflake are often already hybrid systems and are already moving in the direction of a Data Lakehouse. Otherwise, it should be said that it is often the appropriate step for companies to combine a Data Lake for raw data and faster data development with a classical Data Warehouse in a Data Lakehouse.
Here are articles on this topic that might also be interesting:
Sources and Further Readings
[1] IBM, Data Warehouse (2020)
[2] AWS, What is a Lake House approach? (2021)





