avatarChristianlauer

Summary

The provided content discusses the differences between Data Mesh and Data Fabric, emphasizing their distinct organizational and technical approaches to data management.

Abstract

The article "Data Mesh vs Data Fabric" delves into the definitions and distinctions between two emerging data management paradigms. Data Mesh is presented as an organizational strategy that decentralizes data ownership, treating data as a product within domain-specific teams, and advocates for a self-serve infrastructure with federated governance. It often leverages a Data Lakehouse architecture to manage the data lifecycle effectively. On the other hand, Data Fabric is a technical solution aimed at integrating various data sources, including Data Warehouses, Data Lakes, and Data Hubs, to enable seamless data access and sharing across distributed environments. It provides services for enhanced control, monitoring, and data distribution across systems. The summary highlights that while Data Mesh focuses on restructuring data management at an organizational level, Data Fabric is more about the technical integration and governance of data across different platforms.

Opinions

  • The author suggests that Data Mesh is less about technical solutions and more about a new organizational perspective, emphasizing domain-oriented ownership and architecture.
  • Data as a product is a key principle in the Data Mesh approach, which is facilitated by the Data Lakehouse architecture.
  • The article implies that a self-serve data infrastructure is crucial for empowering users and data scientists with accessible data.
  • Federated computational governance is highlighted as an essential component of a Data Mesh, ensuring secure and role-based data management.
  • Data Fabric is seen as an evolution from Data Lakehouse, further integrating data from various applications and platforms, and offering advanced services for data control and monitoring.
  • The author posits that Data Fabric is a technical approach, contrasting with the organizational focus of Data Mesh.
  • The article encourages readers to explore further readings and resources to understand these concepts better, including specific references to articles and blog posts on Data Lakehouse, Data Fabric, and related technologies.

Data Mesh vs Data Fabric

What is what and what are the Differences?

Photo by Willian Justen de Vasconcellos on Unsplash

While Data Lake and Data Warehouse are already relatively well-known terms and are established in many companies, Data Meshes and Data Fabrics are still somewhat lesser-known terms.

Let’s first define both terms or technologies and then see where the differences are.

The Data Mesh

It is important to understand that the Data Mesh concept primarily establishes a new organizational perspective and is less based on technical problem solving. Therefore, you should consider this four principles when building up a Data Mesh organization [1]:

  • Domain-oriented decentralized data ownership and architecture: A Data Mesh should serve the individuals business units. Therefore, one or different Data Lakehouses could be build.
  • Data as a product: The Data Lakehouse architecture helps to manage data as a product by providing different data team members in domain-specific teams complete control over the data lifecycle.
Architecture of a Data Mesh — Source: upsolver.com[2]
  • Self-serve data infrastructure as a platform: Users can supply themselves with data in a self-service BI tool, while Data Scientists, for example, access the same data and develop models.
  • Federated computational governance: The data should be backed up and distributed with a role concept. Data catalogs are also helpful here, for example.

A Data Mesh can, for example, be set up on the technical basis of a Data Lakehouse. A Data Lakehouse combines the advantages of a Data Lake and a Data Warehouse. Read more about it here:

The Data Fabric

A Data Fabric is designed to help organizations solve complex data problems and use cases by managing their data regardless of the types of applications, platforms, and where the data is stored. Data Fabric enables seamless access and data sharing in a distributed data environment. It is similar to the Data Lakehouse, which combines the Data Warehouse and the Data Lake, but goes one step further and also integrates data from applications with each other.

Data Warehouse vs. Lake vs. Fabric — Source: infopulse.com[3]

So the idea is to integrate databases and forms such as Data Warehouses and Data Lakes better maybe also by using Data Hubs and to be able to share data better. Data Fabrics go one step further and offer you services that facilitate control, monitoring, etc. for you and the company.

Summary

While Data Mesh focuses on the organizational aspects of a data analysis platform, Data Fabric focuses not only on data integration and analysis, but also on the distribution of data across systems. In addition, unlike the Data Mesh, the Data Fabric is a technical approach.

Sources and Further Readings

[1]Michael Armbrust, Ali Ghodsi, Bharath Gowda, Arsalan Tavakoli-Shiraji, Reynold Xin and Matei Zaharia, Frequently Asked Questions About the Data Lakehouse (2021)

[2] upsolver.com, Demystifying the Data Mesh: a Quick “What is” and “How to” (2022)

[3] infopulse.com, The Many Faces of Cloud Data Platforms: Data Warehouse, Data Lake, and Data Fabric (2022)

Technology
Data Science
Data
Big Data
Data Mesh
Recommended from ReadMedium