Chapter 14 — Data Cloud Case Study: Identity Resolution

The most important step in Data Cloud is the unification of records, which can be accomplished with the Identity Resolution.
In Chapter 12 we discussed why there is a need to harmonize the data, and in Chapter 13 we did the full analysis on how to harmonize in the F1F4 project. Now that we have harmonized the data into the Customer 360 Data Model we will be able to process that data to find all the records related to each customer, with all the information from every connected source, and be able to get the Customer 360 view we so much desire.
You can follow the case study here:
Table Of Contents
We Are Ready To Start With Data Cloud! What Is Identity Resolution? Customer Buckets How Does Identity Resolution Work? — Matching Rules — Reconciliation Rules Not A Golden Record! Creating A Ruleset New DMO Objects Matching Rules — Account — Individual — Custom Rules Understanding Results Fixing Matching Rules — Troubleshooting Identity Rules Reconciliation Rules
We Are Ready To Start With Data Cloud!
We have done a lot so far in this case study, but we are just getting started with the fun stuff!
In this case study, we have not only ingested data from Salesforce CRM in Chapter 5 and from Google Cloud Storage in Chapter 7, then harmonized the data in Chapter 13, but most importantly, we did a complete analysis of the ingestion in Chapter 11 and the harmonization of the data in Chapter 13 for the F1F4 project.
We can visualize what we have accomplished by reviewing the Data Cloud Data Flow and putting some checks on the completed tasks.

But more importantly, we are now ready to start working with Data Cloud's main features: Identity Resolution, Segmentation, and Activations. This also means that from this point forward we can ignore the DSO and DLO objects since, from this point forward, we are going to be working with the DMO objects.
What Is Identity Resolution?
As we discussed in Chapter 1, the Customer 360 View allows you to have a whole picture of each of your customers, regardless of where the data is coming from, or how the data is formatted.

In the image above, we can see that Samantha (Sam or Just S.) has data about her in different systems, but also the data is not consistent. We see that Service Cloud has her name as Samantha, but in Marketing Cloud she is called Sam and in Commerce Cloud, her name is just S. Similarly happens with the phone number and email.
How can we find all the records for our customers if the data is not standardized? That’s what the Identity Resolution will help us with!
Customer Buckets
Identity Resolution can be easily understood if we think about what we are trying to do.
First and foremost, we must know the number of customers our company has or at least a very close approximation. Without this number, you will not be able to successfully perform a good Identity Resolution. Let me explain why this number is important.
Let’s assume that F1F4 has 500 fans, in a real case scenario this number would be in the millions, but for the data, we have used in this case study this number is 500.
Also, in this case study, we gathered data about the fans from Salesforce CRM and Google Cloud Storage. From Salesforce CRM, we ingested 500 fans and harmonized those records into the Individual DMO. We also ingested 500 fans from Google Cloud Storage and harmonized them into the same Individual DMO.
By the way, my data is very clean, but in reality, it probably is not, and the exact same 500 fans are in both places without duplicates within each system.
We can perform this SOQL query and view the data we have harmonized into the Individual DMO
SELECT ssot__Id__c, ssot__FirstName__c, ssot__LastName__c,
ssot__DataSourceId__c, ssot__DataSourceObjectId__c
FROM ssot__Individual__dlm ORDER BY ssot__LastName__c
Using the Developer Console in Salesforce, we can execute that query and view some of the data to get a better understanding of what we have accomplished in the ingestion and harmonization steps.

We have 1000 records in the Individual DMO because as we mentioned above the data is clean and the exact same 500 fans in the Contact sObject from Salesforce CRM are in the Fans table in Google Cloud Storage. But this may not happen in real life.
Also, we are only ingesting data from 2 sources, but if we were ingesting the data from many other sources we would be getting a lot more than 1000 records.
Note that although we have 500 fans, we have 1000 Individual records which means we have created duplicate records, but that is expected!
We need to group the records for the same fans into 500 buckets, where each bucket will contain all the records for the same fan.
We may end up with 495 or 505 buckets if our data is not 100% clean, and that is probably OK because that would be a 1% difference. But if we end up with 250 or 1000 buckets (a 2x factor in our example), then we will have to reconsider how we are grouping records. This leads me to two more key concepts:
- Overgrouping: If we end up with 50 buckets, that means we are putting fans who are not related in the same bucket. This could be the case where we match by last name and multiple fans from the same family are being put in the same group.
- Undergrouping: If we end up with 1000 buckets, then fans that should be grouped are being put in different buckets. This could happen if we match on exact first name, and some fans have slightly different spellings of their first names (Andy, Andrew, Andres, Andrés) in different systems.
How Does Identity Resolution Work?
The Identity resolution is made of two different steps: Matching Rules and Reconciliation rules. These steps help create the Unified Profile.

When the Identity Resolution runs, we create a Unified Profile, indicated by the red box on the previous image. The Unified Profile consists of two main elements:
- A new record on the Unified Individual DMO indicated with the purple token
- All the related data!
Please note, that the Unified Individual is similar to the concept of a Golden Record, but that is not the output of the Identity Resolution. Data Cloud creates a Unified Profile made of the Unified Individual and related DMO records. We’ll talk more about this later.
The Identity Resolution ruleset is made of two sets of rules:
Matching Rules
These are the rules that will help us find the related records and put them in the same bucket we described before.
Taking a close look at some sample data in the Developer Console, we can see that matching is not quite straightforward.

We can’t match fans by their ID (ssot__Id__c) because the same fan has different values. Similarly, the First Name (ssot__FirstName__c) is similar but has a slightly different spelling, so we can’t do an exact string match. Finally, the Last Name (ssot__LastName__c) is the same, but we can’t just match that because there are multiple people with the same last name.
We need some advanced matching logic, and that is precisely what the Matching Rules in the Identity Resolution will help us with.
Reconciliation Rules
As discussed before, the Identity Resolution creates a Unified Individual (similar to the concept of a Golden Record) where we choose the values for First Name, Last Name, etc. The Reconciliation Rules define how those values are being chosen.
Not A Golden Record!
Wait, what do you mean, not a golden record? I thought you said the Unified Individual was the golden record. I’m lost!
Well, true… the Unified Individual is similar in concept to the golden record because we start with multiple phones, emails, and first names, and we choose the best one for that record.
But… remember the whole idea of the Unified Profile is that it preserves ALL the values, and with the help of Data Cloud, we can also know which value came from which system, enabling traceability!
For example, we may have the business and personal emails for a fan, but on the Unified Individual record we choose to keep only one, let’s say it’s the business phone. When we export the data into Marketing Cloud (which the fan prefers to use their personal email) we do not want to email the fan on their business email! We want to communicate with the customer in that channel using their preferred email. We need to send emails to their personal email, which was the one the customer wanted us to send him marketing information. This would not be possible if we used the golden record because we would lose the values that were not chosen.
We can compare this whole concept to a keychain.

Here, we can see the Unified Profile is a collection of the Unified Individual DMO and all the data that we have about the fans, allowing us to export data with the email or phone that the fan would like to be contacted at in that specific channel, regardless of which information we have chosen for the Unified Individual record.
Creating A Ruleset
Let’s take a look at how to create a new ruleset