avatarAbraham Pabbathi

Summary

The article compares Databricks and Microsoft Fabric based on a 3S model (Simplicity, Security, Shareability) to help companies choose the right data platform technology.

Abstract

The author, who discloses employment at Databricks, presents a comparison between Databricks and Microsoft Fabric for building data platforms, emphasizing the importance of simplicity, security, and shareability. Databricks is touted for its single storage pattern and engine, which simplifies the platform and enhances reliability. In contrast, Microsoft Fabric's use of multiple technologies is seen as creating data silos and duplication, potentially complicating security and increasing costs. On security, Databricks' Unity Catalog is highlighted for centralized access control, while Microsoft Fabric's siloed approach is critiqued for its complexity in managing data security across different storage models, despite the presence of Microsoft Purview. For shareability, Databricks' Delta Sharing and Clean Rooms are praised for enabling secure data exchange within and across organizations, whereas Microsoft Fabric is noted for lacking such inter-organizational data sharing capabilities.

Opinions

  • The author believes that a simple yet powerful tool is preferable to a complex one, even if they have the same capabilities.
  • Databricks is favored for its unified approach to storage and processing, which is seen as more reliable and easier to manage.
  • Microsoft Fabric's use of multiple technologies is viewed as a drawback due to the potential for data silos and increased costs.
  • Centralized security management through Databricks' Unity Catalog is considered superior to Microsoft Fabric's siloed security approach.
  • The author values the ability to share data easily and securely, highlighting Databricks' Delta Sharing and Clean Rooms as key features for data collaboration.
  • Microsoft Fabric is criticized for not supporting data sharing outside the organization, which is seen as a limitation in today's data economy.
  • The author encourages readers to try both Databricks and Microsoft Fabric to determine which platform aligns best with their organization's needs.

Databricks or Microsoft Fabric?

One of the questions I’ve been getting lately is, given the huge overlap between the features of Databricks and Microsoft Fabric, which one should customers standardize on while building out their Data Platforms? While every company has it’s own unique set of requirements based on which they need to make the decision, in this article I propose three requirements which should be universal to all companies looking to build their data platforms irrespective of their industry and size. I call it the 3S model for choosing a data platform technology. Full Disclosure, I work for Databricks, but the opinions and views expressed in this article are solely my own and do not represent the views or opinions of my employer.

Fig 1: 3S Model for choosing Data Platform Technology

Requirement 1: Simplicity

A simple yet powerful tool will always win over the more complex tool with the same capabilities. Having fewer moving parts not only makes the tool simple, but it also makes it more reliable. Lets compare Databricks and Fabric against this backdrop.

Databricks uses a single storage pattern (i.e. Delta tables) and a single engine (Spark + Photon) in all its workloads. While these are available in different form factors and price points to suit the needs of different customers, they all leverage the same underlying technology. Store your data in Delta Tables, access them using Spark+Photon engine and you are guaranteed to get the best performance in the world.

Microsoft Fabric is a conglomeration of 4 different technologies i.e. SQL Data Warehouse, Spark Lakehouse, ADX/KQL Database and Power BI Data marts. This link helps you decide when to use each of these four flavors of Fabric storage. What this does is, you end up with data silos and data duplication both of which hinder your ability to analyze data fast and increase your overall costs.

Requirement 2: Security

While all tools have encryption and access control features, what differentiates tools is the ease and flexibility with which you can encrypt and control your data assets. Most customers I have spoken to want to get out of “IAM hell” and want to leverage a single pane of glass to show their complete data security posture.

Microsoft Fabric by virtue of its siloed data storage model requires you to control the data in each of the four silos separately. This adds to the complexity of access control and any slip ups in any of these formats is going to result in exposing your data to entities who shouldn’t have access to your data. While Microsoft Purview attempts to be that single pane of glass, it is yet to completely integrate with Fabric.

Databricks stores all data in cloud storage in Delta tables and files. You can control the access to both tables and files via Unity Catalog. As long as you encrypt and lock down the cloud storage assets (only admins can access raw data on cloud storage) and control access through Unity Catalog, you can rest assured that your data is visible to only entities who have been given access to via Unity Catalog.

It remains to be seen how Microsoft Purview and Databricks Unity Catalog mature over the coming years to become that central governance platform that they aspire to be.

Requirement 3: Shareability

In this day and age sharing data within an organization and across organizations has become increasingly important. Hoarding data leads to locking up the value of this important asset and loss of opportunity to monetize your data.

Microsoft Fabric while great at allowing you to ingest data from different sources doesn’t itself allow its data to be shared outside your organization. This is an important feature I believe should be available in every data platform to ensure inter-operability between organizations and within organizations.

Databricks provides Delta Sharing which allows you to share within and across organizations safely. Databricks Clean rooms is an added feature which further enhances your ability to securely share data externally.

Conclusion

While there might be other critical requirements that need to be considered before making a decision, the three points discussed above are a great starting point. Do try both the tools and let me know in the comments which one you like better for your organization.

Microsoft Fabric
Databricks
Azure Databricks
Data Platforms
Data Warehouse
Recommended from ReadMedium