Google launches Entity Resolution for BigQuery
An Introduction to Entity Resolution — How to share Data more easily

With the Analytic Hub and Data Clean Rooms, Google opened the door for easier, better and more secure data sharing within the Google Cloud and BigQuery. Now, they have added a very powerful new functionality: Entity Resolution.
This new feature lets users match records across datasets even when a common identifier is missing. It utilizes an identity provider for this process and supports LiveRamp and provides a framework for other identity providers to offer similar services[1]. BigQuery lets you choose your identity provider. Entity resolution acts as a data preparation step before you share your data in a Data Clean Rooms. This feature was introduced last year, if you missed it and want to dive deeper, please also use this article:
The concept of a Data Clean Room is to keep user data isolated and private. A Data Clean Room provides aggregated and anonymized user information to protect user privacy, while providing advertisers with non-personally identifiable information to target a specific demographic and for audience measurement [2].

Entity resolution also prepares your data for better joins with third-party data in Analytics Hub. This feature is accessible across all compute models, and its use is not restricted by edition[1][4].
Benefits of using Entity Resolution
So while other updates often let Data Engineers and Scientists profit, this update lets end users benefit from it, since they can now use entity resolution in the following ways[4]:
- You can resolve entities in place without invoking data transfer fees. Your identity provider matches your data to their identity table. The match results are written to a dataset in your project.
- You don’t need to manage ETL processes.
While the identity provider:
- now can offer entity resolution as a managed software as a service (SaaS) offering on Google Cloud Marketplace.
- can use your proprietary identity graphs and match logic without revealing them to users.
If you are asking yourself what an identity or identity provider is, ID resolution is about identifying individual users across digital touch points. For example, you as a BigQuery end user use data from a marketing company which then can be shared via a Data Clean Room and connected for example by the Email of a user (Entity Resolution).
Architecture within BigQuery
Before using this kind of feature, you and maybe also the CIO and CISO might want to know how this works and how data is exchanged. The matching of data happens by BigQuery implementing entity resolution by using a remote function call that activate entity resolution processes in an identity provider’s environment[4]. A big benefit of data security and also costs is that the data does not need to be copied or moved during this process.

Definitely a cool new feature for all BigQuery users who are using external data (identity) providers, especially when working in marketing & Co. this will be often the case. I have linked additional resources down below, where you can also find more technical details and how to set up the whole thing. Also, I have mentioned the Google Analytic Hub, in this regard there is an article which you can read more about this service:
Sources and Further Readings
[1] Google, BigQuery release notes (2024)
[2] TechTarget, data clean room (2023)
[3] Google, Secure and privacy-centric sharing with data clean rooms in BigQuery (2023)
[4] Google, Introduction to entity resolution in BigQuery (2024)