Does the modern Data Lakehouse kill the Data Warehouse?
How the SaaS Data Lakehouse will end the Era of classical Data Warehouses

In this article, I would like to talk about the SaaS Data Lakehouse, its advantages and why it is maybe displacing the classic Data Warehouse.
First of all let’s define what is meant by a classic Data Warehouse. It’s a centralized repository where large amounts of data from various sources are processed and stored via ETL in a structured form. It is typically used for (self-) Business Intelligence and reporting purposes and is optimized for reading data rather than writing data. Often, it’s also built on premise architecture.
On the other hand, the SaaS Data Lakehouse combines the best features of a Data Lake and a Data Warehouse. It offers a flexible and scalable architecture that allows organizations to store both structured and unstructured data via ELT processes. It combines SQL and NoSQL features and also offers services for data governance, security and compliance. A deeper comparison can be found in the article below.
One of the main advantages of a SaaS based Data Lakehouse over a traditional Data Warehouse is its flexibility. The Data Lakehouse allows companies to store all types of data, regardless of format, structure or source. This is in contrast to a classic Data Warehouse, which is designed to store structured data only [1]. With a Data Lakehouse, companies can store data in its raw form (the Data Lake component), allowing them to apply different use cases like Machine Learning, Business Intelligence or Data Science, without having to move and transform the data every time they want to perform a new analysis. The data can then be processed and transformed via ETL process for the Data Warehouse component.
Another benefit of a SaaS Data Lakehouse is its scalability. As the amount of data generated by companies continues to grow, a cloud based Data Lakehouse can be easily customized to meet their needs. This is in contrast to a traditional Data Warehouse, which can become expensive to maintain and scale as the amount of data grows [2].
While the above-mentioned advantages should already be known, other advantages have recently been added or disadvantages have been removed. One example is the Zero ETL approach that many large providers such as Google or AWS are currently expanding and promoting.
Here, additional ETL tools are no longer necessary. Instead, data is automatically loaded from source systems via services and can then be adapted in the subsequent data process. In this way, necessary transformations can simply be carried out later via SQL and in the scalable Data Warehouse solution. Zero ETL approaches that allow queries directly to the source system are now also on the market. One example is Google BigLake, which can be used to perform analyses on e.g. AWS or Azure storage simply via SQL using Google BigQuery.
Besides all these advantages, the disadvantages are also becoming less and less. It was often the case that with the new SaaS Data Warehouses or Lakehouses, which are often column-based and combine NoSQL and SQL, some features from the classic SQL Data Warehouse era were not possible. However, the large providers such as Google, AWS and Microsoft, as well as platform-independent solutions such as Snowflake, are constantly offering new features.
Examples are that e.g. Google now also offers primary and foreign keys for BigQuery. Previously, Data Engineers had to rely on nested data structures. Another example is that AWS now offers cubes for its Data Warehouse Redshift. Features that were standard in traditional Data Warehouses, but are now increasingly offered in modern solutions, which should make migrations easier for companies.
In summary, the SaaS Data Lakehouse will probably more and more outpace the traditional Data Warehouse by providing businesses with a more flexible, scalable and cost-effective solution for storing and managing their data. In addition, the stumbling blocks that make migrations difficult are becoming fewer and fewer due to new features in SaaS based Data Lakehouses, so that these solutions offer the same plus new features in contrast to classic on-premise solutions.
Sources and Further Readings
[1] AWS, What is a Lake House approach? (2021)
[2] rillion, On Premise or SaaS — Which is Better? (2021)





