avatarChristianlauer

Summary

The article discusses the rise of the SaaS Data Lakehouse as a flexible, scalable, and cost-effective alternative to traditional Data Warehouses, potentially marking the end of the latter's dominance in data management.

Abstract

The article "Does the modern Data Lakehouse kill the Data Warehouse?" delves into the concept of the SaaS Data Lakehouse, highlighting its ability to seamlessly integrate the capabilities of both Data Lakes and Data Warehouses. It emphasizes the Lakehouse's advantage in handling diverse data types and its scalability to accommodate the ever-growing data needs of organizations. The Lakehouse is presented as a solution that allows for a wide array of data use cases without the need for constant data movement and transformation. The article also touches on recent advancements such as the Zero ETL approach, which simplifies data processing, and the introduction of features that address previous limitations of SaaS solutions, making them more attractive for data management and analysis.

Opinions

  • The author suggests that the SaaS Data Lakehouse is superior to traditional Data Warehouses due to its flexibility in storing various data formats and structures.
  • The scalability of cloud-based Data Lakehouses is seen as a significant advantage over the often costly and difficult to scale traditional Data Warehouses.
  • The Zero ETL approach is highlighted as an emerging trend that reduces the complexity of data integration, promoted by major providers like Google and AWS.
  • The article posits that the disadvantages of SaaS Data Lakehouses are diminishing as providers introduce new features that were once exclusive to traditional SQL Data Warehouses.
  • The author believes that the addition of new features in SaaS Data Lakehouses, such as primary and foreign keys in Google BigQuery and cubes in AWS Redshift, is making the transition from traditional Data Warehouses more feasible for companies.
  • It is the opinion of the author that the SaaS Data Lakehouse represents a more efficient and comprehensive solution for businesses in managing and analyzing data compared to the classical Data Warehouse model.

Does the modern Data Lakehouse kill the Data Warehouse?

How the SaaS Data Lakehouse will end the Era of classical Data Warehouses

Photo by Luca Bravo on Unsplash

In this article, I would like to talk about the SaaS Data Lakehouse, its advantages and why it is maybe displacing the classic Data Warehouse.

First of all let’s define what is meant by a classic Data Warehouse. It’s a centralized repository where large amounts of data from various sources are processed and stored via ETL in a structured form. It is typically used for (self-) Business Intelligence and reporting purposes and is optimized for reading data rather than writing data. Often, it’s also built on premise architecture.

On the other hand, the SaaS Data Lakehouse combines the best features of a Data Lake and a Data Warehouse. It offers a flexible and scalable architecture that allows organizations to store both structured and unstructured data via ELT processes. It combines SQL and NoSQL features and also offers services for data governance, security and compliance. A deeper comparison can be found in the article below.

One of the main advantages of a SaaS based Data Lakehouse over a traditional Data Warehouse is its flexibility. The Data Lakehouse allows companies to store all types of data, regardless of format, structure or source. This is in contrast to a classic Data Warehouse, which is designed to store structured data only [1]. With a Data Lakehouse, companies can store data in its raw form (the Data Lake component), allowing them to apply different use cases like Machine Learning, Business Intelligence or Data Science, without having to move and transform the data every time they want to perform a new analysis. The data can then be processed and transformed via ETL process for the Data Warehouse component.

Another benefit of a SaaS Data Lakehouse is its scalability. As the amount of data generated by companies continues to grow, a cloud based Data Lakehouse can be easily customized to meet their needs. This is in contrast to a traditional Data Warehouse, which can become expensive to maintain and scale as the amount of data grows [2].

While the above-mentioned advantages should already be known, other advantages have recently been added or disadvantages have been removed. One example is the Zero ETL approach that many large providers such as Google or AWS are currently expanding and promoting.

Here, additional ETL tools are no longer necessary. Instead, data is automatically loaded from source systems via services and can then be adapted in the subsequent data process. In this way, necessary transformations can simply be carried out later via SQL and in the scalable Data Warehouse solution. Zero ETL approaches that allow queries directly to the source system are now also on the market. One example is Google BigLake, which can be used to perform analyses on e.g. AWS or Azure storage simply via SQL using Google BigQuery.

Besides all these advantages, the disadvantages are also becoming less and less. It was often the case that with the new SaaS Data Warehouses or Lakehouses, which are often column-based and combine NoSQL and SQL, some features from the classic SQL Data Warehouse era were not possible. However, the large providers such as Google, AWS and Microsoft, as well as platform-independent solutions such as Snowflake, are constantly offering new features.

Examples are that e.g. Google now also offers primary and foreign keys for BigQuery. Previously, Data Engineers had to rely on nested data structures. Another example is that AWS now offers cubes for its Data Warehouse Redshift. Features that were standard in traditional Data Warehouses, but are now increasingly offered in modern solutions, which should make migrations easier for companies.

In summary, the SaaS Data Lakehouse will probably more and more outpace the traditional Data Warehouse by providing businesses with a more flexible, scalable and cost-effective solution for storing and managing their data. In addition, the stumbling blocks that make migrations difficult are becoming fewer and fewer due to new features in SaaS based Data Lakehouses, so that these solutions offer the same plus new features in contrast to classic on-premise solutions.

Sources and Further Readings

[1] AWS, What is a Lake House approach? (2021)

[2] rillion, On Premise or SaaS — Which is Better? (2021)

Data Science
Technology
Software Engineering
Big Data
Data
Recommended from ReadMedium
avatarPiethein Strengholt
Data Management at Scale

19 min read