avatarChristianlauer

Summary

Google Dataplex is a data fabric solution offered by Google Cloud Platform (GCP) that helps enterprises manage, monitor, and control their data across Data Lakes, Data Warehouses, and Data Marts with consistent controls, enabling access to trusted data and performing analytics at scale.

Abstract

Google Dataplex is a data fabric solution that aims to prevent data silos and the formation of "Data Swamps" by providing centralized security, governance, and unification of distributed data without data movement. It allows enterprises to organize their data based on business needs and manage, monitor, and govern this data across various data sources. With Dataplex, users can create lakes where they can control the location, access, and other governance issues, perform analyses similar to BigQuery, and assign them to a lake. The interface is similar to BigQuery, but with access to other data sources. Google offers a tool to turn Data Lakehouses into Data Meshes.

Bullet points

  • Google Dataplex is a data fabric solution offered by GCP.
  • It helps enterprises manage, monitor, and control their data across Data Lakes, Data Warehouses, and Data Marts.
  • Dataplex enables centralized security and governance and unifies distributed data without data movement.
  • It allows enterprises to organize their data based on business needs.
  • Dataplex helps prevent the formation of "Data Swamps" by providing centralized security, governance, and unification of distributed data.
  • Users can create lakes where they can control the location, access, and other governance issues.
  • Dataplex allows users to perform analyses similar to BigQuery and assign them to a lake.
  • The interface is similar to BigQuery, but with access to other data sources.
  • Google offers a tool to turn Data Lakehouses into Data Meshes.

What is Google Dataplex? — The Data Fabric

Why you need it for your Data Lakehouse and Data Mesh

Photo by Andrew Ly on Unsplash

What is Google Data Plex and how can you use it within GCP to better age your Data Lakes and Lakehouses ? Here is a brief overview.

What does Google offer?

Companies fear, or rather should fear, data silos. Otherwise, their Data Lake or Data Lakehouse often quickly dries up into a Data Swamp. With Dataplex’s intelligent data fabric, Google promises to enable enterprises to centrally manage, monitor and control their data across Data Lakes, Data Warehouses and Data Marts with consistent controls to enable access to trusted data and perform analytics at scale.

What Problems can occur without Governance?

If a Data Lake holds too much data in a poorly organized manner without suitable metadata management and a reliable data governance, relevant data becomes increasingly difficult to find. The information content of the Data Lake decreases, even though new data is constantly being added. A lack of life cycle management of the data also leads to the silting up of a Data Lake. After a certain time, data loses its relevance. If the data still remains in the data depot, more and more data with a lack of relevance accumulates over long periods of time. Incorrect time stamps of a data set also lead to information that cannot be found or evaluated.

How does Dataplex help?

With Dataplex, you can enable centralized security and governance and unify distributed data without data movement.

Enterprises have data distributed across data lakes, data warehouses, and data marts. Dataplex enables you to unify this data without any data movement, organize it based on your business needs, and centrally manage, monitor, and govern this data. Dataplex enables standardization and unification of metadata, security policies, governance, classification, and data lifecycle management across this distributed data. — Google [1]

Dataplex to unify your Data Warehouses Lakes and Marts — Source: Google[1]

So with Dataplex you can create lakes where you can control the location, access and other issues around the governance. Inside you can perform analyses similar to BigQuery and assign them to a lake. Interesting if you want to control e.g. individual departments or even countries within a company. Here, it is important that e.g. one department cannot look into the data of the other but can easily share this data internally.

UI of Google Dataplex — Image by Author

Summary

With Dataplex, Google offers a wonderful way to control Data Warehouses and Data Lakes and not let them become Data Swamps (click here for more info). The interface is similar to BigQuery, except that you can also access other data sources. Google offers a tool to turn Data Lakeshouses into Data Meshes.

Sources and further Readings

[1] Google, What is Dataplex? (2022)

Data Science
Big Data
Google
Technology
Data Fabric
Recommended from ReadMedium