Building a Data Mesh on Microsoft Azure
How to implement a solid Data Platform on Azure

Data Lakehouses and Data Meshes are on the rise, it is important to say that they do not replace Data Warehouse and Data Lake but they extend them, so to speak. Microsoft also offers solutions in its Azure Cloud, here is an overview of how such a Data Mesh can look in the Microsoft world.
Recap: What is a Data Mesh?
A Data Mesh approach could improve the Data Lake/house as the dominant architectural paradigm. It is important to understand that the Data Mesh concept primarily establishes a new organizational perspective and is less based on technical problem solving. Therefore, you should consider this four principles when building up a Data Mesh organization [1]:
- Principle 1: Domain-oriented decentralized data ownership and architecture: A Data Mesh should serve the individuals business units. Therefore, one or different Data Lakehouses could be built.
- Principle 2: Data as a product: The Data Lakehouse architecture helps to manage data as a product by providing different data team members in domain-specific teams complete control over the data lifecycle.
- Principle 3: Self-serve data infrastructure as a platform: Users can supply themselves with data in a self-service BI tool, while Data Scientists, for example, access the same data and develop models.
- Principle 4: Federated computational governance: The data should be backed up and distributed with a role concept. Data catalogs are also helpful here, for example.
To dive deeper into this topic you might find this article useful: What is a Data Mesh? New Technology or just an Approach for efficient Data Platforms?
Building up a Data Mesh Architecture in Azure
So the first step is to design a technical construct, which can be a Data Warehouse or the “further development” in the form of a Data Lakehouse. I think it depends on the use case and the size of the company. Hybrid systems such as Google BigLake and BigQuery or the equivalent in Azure, e.g., with Data Lake and Azure Synapse or Delta Lake, as well as the classical Data Warehouses are increasingly becoming Data Lakehouses by providing integration with Machine Learning, other systems and BI tools.
One possible solution you can see below, here Microsoft itself describes which services are best suited for a data mesh.

Here, Microsoft suggests for example Pipelines Services, IoT Hubs and as well Event Hubs as data integration tools. Using the Pieplines for classical data like relational databases and IoT and Event Hubs for real time data.
But as I said, this is only one solution, if you are running cross-platform systems, you could also use a platform independent tool like talend or Alteryx. So that you can also realize interfaces from non-Microsoft products and clouds more easily. For storing the Data it makes then sense to use Data Lake Gen2 storage so that in the next processes you can easily get the data into Azure Stream Analytics, Azure Synapse or maybe even a Delta Lake to enable Data Analysts with SQL, Machine Learning, etc. Here it makes sense to look at what the users need in the end, maybe data analysis in the form of SQL and a BI layer is enough for you, then you can do without other solutions, especially from the point of view of costs — not everything that is considered cool today also serves the business purpose.
To make normal business users happy in the end, you can use Power BI (now also available via Teams in the mobile version — Read more about it here. ) or Excel as a BI layer. What ultimately turns the Data Lakehouse into a Data Mesh is to monitor the entire technical construct and set up a Data Governance.

Here, for example, data catalogs can be used to ensure that the right data goes to the right users. With monitoring functions, it is also best to monitor the technical setting and, above all, the costs in order to make the CIO happy.
Summary
So to realize a Data Mesh in Microsoft Azure but also other clouds, everything is actually ready for you, the important thing is to look at the data integration, whether perhaps a cross-platform software makes more sense, or to check whether the cloud internal solutions also support external systems. After that, you can usually build up your Data Mesh or, on a technical level, your Data Lakehouse in the first step with a wealth of tools. Here, as I said, it depends on your needs, not necessarily every feature is necessary, here probably the size of the company will play a significant role. Another important thing is to use the right tools when setting up a Data Mesh in order to establish appropriate Data Governance so that the data only goes to the right people with the right data quality.
Sources and Further Readings
[1] Michael Armbrust, Ali Ghodsi, Bharath Gowda, Arsalan Tavakoli-Shiraji, Reynold Xin and Matei Zaharia, Frequently Asked Questions About the Data Lakehouse (2021)
[2] Microsoft, What is a data mesh? (2022)
[3] Microsoft, What’s available in the Microsoft Purview governance portal? (2022)
