avatar(ELTORO . IT) Andres Perez

Summary

The text discusses the process of Identity Resolution in Data Cloud, focusing on the unification of records using matching and reconciliation rules to create a unified profile for each customer.

Abstract

The text begins by explaining the importance of Identity Resolution in Data Cloud, which allows for the unification of records from various sources to create a comprehensive view of each customer. The process involves creating customer buckets, where records for the same customer are grouped together, and then applying matching and reconciliation rules to create a unified profile. Matching rules help find related records and put them in the same bucket, while reconciliation rules define how values are chosen for the unified profile. The text also discusses the concept of a Unified Profile, which is similar to a golden record but preserves all values and allows for traceability. The creation of a ruleset, new DMO objects, and understanding results are also covered.

Bullet points

  • Identity Resolution is the process of unifying records in Data Cloud to create a comprehensive view of each customer.
  • Customer buckets are created by grouping records for the same customer together.
  • Matching rules help find related records and put them in the same bucket.
  • Reconciliation rules define how values are chosen for the unified profile.
  • The Unified Profile is similar to a golden record but preserves all values and allows for traceability.
  • The creation of a ruleset involves specifying the DMO to be unified and optionally providing a Ruleset ID.
  • New DMO objects are created when building an Identity Resolution ruleset.
  • Understanding results involves reviewing the unification results and comparing them to expected numbers.
  • Troubleshooting Identity Rules involves fixing matching rules and understanding the results.
  • Reconciliation rules involve selecting values for fields based on source priority, most frequent, or last updated.

Chapter 14 — Data Cloud Case Study: Identity Resolution

The most important step in Data Cloud is the unification of records, which can be accomplished with the Identity Resolution.

In Chapter 12 we discussed why there is a need to harmonize the data, and in Chapter 13 we did the full analysis on how to harmonize in the F1F4 project. Now that we have harmonized the data into the Customer 360 Data Model we will be able to process that data to find all the records related to each customer, with all the information from every connected source, and be able to get the Customer 360 view we so much desire.

You can follow the case study here:

Table Of Contents

We Are Ready To Start With Data Cloud! What Is Identity Resolution? Customer Buckets How Does Identity Resolution Work?Matching RulesReconciliation Rules Not A Golden Record! Creating A Ruleset New DMO Objects Matching RulesAccountIndividualCustom Rules Understanding Results Fixing Matching RulesTroubleshooting Identity Rules Reconciliation Rules

We Are Ready To Start With Data Cloud!

We have done a lot so far in this case study, but we are just getting started with the fun stuff!

In this case study, we have not only ingested data from Salesforce CRM in Chapter 5 and from Google Cloud Storage in Chapter 7, then harmonized the data in Chapter 13, but most importantly, we did a complete analysis of the ingestion in Chapter 11 and the harmonization of the data in Chapter 13 for the F1F4 project.

We can visualize what we have accomplished by reviewing the Data Cloud Data Flow and putting some checks on the completed tasks.

But more importantly, we are now ready to start working with Data Cloud's main features: Identity Resolution, Segmentation, and Activations. This also means that from this point forward we can ignore the DSO and DLO objects since, from this point forward, we are going to be working with the DMO objects.

What Is Identity Resolution?

As we discussed in Chapter 1, the Customer 360 View allows you to have a whole picture of each of your customers, regardless of where the data is coming from, or how the data is formatted.

In the image above, we can see that Samantha (Sam or Just S.) has data about her in different systems, but also the data is not consistent. We see that Service Cloud has her name as Samantha, but in Marketing Cloud she is called Sam and in Commerce Cloud, her name is just S. Similarly happens with the phone number and email.

How can we find all the records for our customers if the data is not standardized? That’s what the Identity Resolution will help us with!

Customer Buckets

Identity Resolution can be easily understood if we think about what we are trying to do.

First and foremost, we must know the number of customers our company has or at least a very close approximation. Without this number, you will not be able to successfully perform a good Identity Resolution. Let me explain why this number is important.

Let’s assume that F1F4 has 500 fans, in a real case scenario this number would be in the millions, but for the data, we have used in this case study this number is 500.

Also, in this case study, we gathered data about the fans from Salesforce CRM and Google Cloud Storage. From Salesforce CRM, we ingested 500 fans and harmonized those records into the Individual DMO. We also ingested 500 fans from Google Cloud Storage and harmonized them into the same Individual DMO.

By the way, my data is very clean, but in reality, it probably is not, and the exact same 500 fans are in both places without duplicates within each system.

We can perform this SOQL query and view the data we have harmonized into the Individual DMO

SELECT ssot__Id__c, ssot__FirstName__c, ssot__LastName__c,
ssot__DataSourceId__c, ssot__DataSourceObjectId__c
FROM ssot__Individual__dlm ORDER BY ssot__LastName__c

Using the Developer Console in Salesforce, we can execute that query and view some of the data to get a better understanding of what we have accomplished in the ingestion and harmonization steps.

We have 1000 records in the Individual DMO because as we mentioned above the data is clean and the exact same 500 fans in the Contact sObject from Salesforce CRM are in the Fans table in Google Cloud Storage. But this may not happen in real life.

Also, we are only ingesting data from 2 sources, but if we were ingesting the data from many other sources we would be getting a lot more than 1000 records.

Note that although we have 500 fans, we have 1000 Individual records which means we have created duplicate records, but that is expected!

We need to group the records for the same fans into 500 buckets, where each bucket will contain all the records for the same fan.

We may end up with 495 or 505 buckets if our data is not 100% clean, and that is probably OK because that would be a 1% difference. But if we end up with 250 or 1000 buckets (a 2x factor in our example), then we will have to reconsider how we are grouping records. This leads me to two more key concepts:

  • Overgrouping: If we end up with 50 buckets, that means we are putting fans who are not related in the same bucket. This could be the case where we match by last name and multiple fans from the same family are being put in the same group.
  • Undergrouping: If we end up with 1000 buckets, then fans that should be grouped are being put in different buckets. This could happen if we match on exact first name, and some fans have slightly different spellings of their first names (Andy, Andrew, Andres, Andrés) in different systems.

How Does Identity Resolution Work?

The Identity resolution is made of two different steps: Matching Rules and Reconciliation rules. These steps help create the Unified Profile.

When the Identity Resolution runs, we create a Unified Profile, indicated by the red box on the previous image. The Unified Profile consists of two main elements:

  • A new record on the Unified Individual DMO indicated with the purple token
  • All the related data!

Please note, that the Unified Individual is similar to the concept of a Golden Record, but that is not the output of the Identity Resolution. Data Cloud creates a Unified Profile made of the Unified Individual and related DMO records. We’ll talk more about this later.

The Identity Resolution ruleset is made of two sets of rules:

Matching Rules

These are the rules that will help us find the related records and put them in the same bucket we described before.

Taking a close look at some sample data in the Developer Console, we can see that matching is not quite straightforward.

We can’t match fans by their ID (ssot__Id__c) because the same fan has different values. Similarly, the First Name (ssot__FirstName__c) is similar but has a slightly different spelling, so we can’t do an exact string match. Finally, the Last Name (ssot__LastName__c) is the same, but we can’t just match that because there are multiple people with the same last name.

We need some advanced matching logic, and that is precisely what the Matching Rules in the Identity Resolution will help us with.

Reconciliation Rules

As discussed before, the Identity Resolution creates a Unified Individual (similar to the concept of a Golden Record) where we choose the values for First Name, Last Name, etc. The Reconciliation Rules define how those values are being chosen.

Not A Golden Record!

Wait, what do you mean, not a golden record? I thought you said the Unified Individual was the golden record. I’m lost!

Well, true… the Unified Individual is similar in concept to the golden record because we start with multiple phones, emails, and first names, and we choose the best one for that record.

But… remember the whole idea of the Unified Profile is that it preserves ALL the values, and with the help of Data Cloud, we can also know which value came from which system, enabling traceability!

For example, we may have the business and personal emails for a fan, but on the Unified Individual record we choose to keep only one, let’s say it’s the business phone. When we export the data into Marketing Cloud (which the fan prefers to use their personal email) we do not want to email the fan on their business email! We want to communicate with the customer in that channel using their preferred email. We need to send emails to their personal email, which was the one the customer wanted us to send him marketing information. This would not be possible if we used the golden record because we would lose the values that were not chosen.

We can compare this whole concept to a keychain.

Here, we can see the Unified Profile is a collection of the Unified Individual DMO and all the data that we have about the fans, allowing us to export data with the email or phone that the fan would like to be contacted at in that specific channel, regardless of which information we have chosen for the Unified Individual record.

Creating A Ruleset

Let’s take a look at how to create a new ruleset

New DMO Objects

When building an Identity Resolution ruleset, you need to specify the DMO that you want to unify and optionally provide the Ruleset ID.

In the video, we saw that we can only select Individual on this screen. The other common option is Account, but since we are using PersonAccount, and we did not map the Account object the option is not available.

If you only have one ruleset, then you can leave the Ruleset ID blank, but why would you like to fill it in? The answer is in the next screen.

Notice that each DMO will create some unified versions of those DMO objects, but the label and the API name use the Ruleset ID we indicated in the previous screen.

For example, the Individual DMO will generate a Unified Individual Demo DMO because we typed Demo in the Ruleset ID.

There are a bit more DMOs created.

This is how they are related:

would be replaced with either Contact Point Email, Contact Point Phone, Contact Point Address, or Party Identification

Matching Rules

The matching rules, as we discussed before, will help put matching fan records in the same bucket.

Some available pre-configured match rules are depending on the object that is being unified

Account

As mentioned above, since we are using Person Accounts and we did not map the Account DMO then we don’t have this option but if you had Accounts then this would be available to you

Individual

We did have these options

But what do they mean?

The page “Match Rules and Criteria: Fuzzy and Normalized” on Salesforce’s Help and Training discusses the various matching methods and criteria used in the matching rules. It covers topics such as fuzzy matching, exact normalized matching for specific fields like email, phone, and address, and the use of custom matching rules. Please read that page to understand how matching rules and criteria work in matching rules.

Custom Rules

You can also create custom rules based on Contact Point or the Party Identification DMOs. We’ll see how the matching rules can be built on the Party Identification a little bit later.

Understanding Results

I was a bit sneaky in the video and made a change that I knew was going to give me incorrect results.

When I created the test data for F1F4, I generated first names using different spellings because I wanted to see how Fuzzy First Name matching would work. But in the video, I selected the First Name to be an exact match, expecting very low matches.

Before we review the results in the previous image, let’s review the expectations. This screen only makes sense if we know what we are expecting. This is what we were expecting:

  • We ingested 500 fans from Salesforce CRM and 500 fans from Google Cloud Storage and harmonized that data into the Individual DMO
  • We should have 1000 individual DMO records by combining the numbers from the previous bullet
  • The 500 fans in Salesforce CRM are the exact same 500 fans from Google Cloud Storage
  • We are expecting to have 500 buckets (Unified Individual DMOs) created.

The numbers below are horrible because we knew what numbers we were expecting! Maybe with a different set of expectations, these numbers would be awesome.

Let’s review the unification results.

  • [1] The unification using the exact First Name created 999 unified buckets. This is an example of undergrouping. We created too many buckets and we did not group records enough.
  • [2] I started with a 1000 (1k) source profiles.
  • [3] The consolidation rate is 0%. This means that 0% of the records were found to be duplicated and consolidated into different buckets.
  • [4] We only found 2 duplicate records
  • [5] This found 999 known profiles
  • [6] This found 0 Anonymous profiles

We’ll talk about Anonymous profiles in a different chapter.

Just to make sure we are on the same page, these numbers could have been awesome if we had different expectations. For example, let’s say that we only ingested 1000 unique profile records from AWS S3 then we would expect to have these numbers.

The numbers only makes sense if you know what you are expecting!

Fixing Matching Rules

Let’s change the matching rules to match profiles based on fussy first names. While we are doing that, let’s also set the matching based on the Fan Id which is a key identifier for F1F4.

Troubleshooting Identity Rules

After I got the results from running this configuration, I got some unexpected results.

It got a bit better, it matched one additional profile 😳. But that is not really what I was expecting this time.

In the first video I did make a mistake on purpose, because I wanted to prove a point, but this time I was actually trying to get good results. But it did not work! I was going to try to troubleshoot this problem behind cameras, but I think this will be something that we can all learn from, so I will create Chapter 15 to show you how to troubleshoot Identity Resolution issues. I think it’s important to learn how things work, but also how things break and how to fix them… this is how we really learn :-)

After I fixed the problem with the Party Identification as explained in Chapter 15, I got the results I was expecting.

Reconciliation Rules

As described earlier, multiple DMO objects get created, in particular, I am talking about the Unified <Source DMO> RULE_ID DMOs. These records will be similar in concept to the golden record, where we can select the values for the fields based on a few options available:

  • Source Priority
  • Most Frequent
  • Last Updated (only for Individual DMO)

Following are a few good examples of when to use these options

Your website requires the users to confirm a code via email or SMS whenever they log in to the system. In this case, we want the email or phone coming from the website because we would know the values are correct, otherwise, the user can’t log in to the system.

If on the other hand, we have a shipping system we should get the value of the address from that system, instead of the website, because if the customer does not get their product delivered or gets delivered to the wrong address, then the customer would create cases letting us know.

But for the first name, we may want to use the most recent or most frequent value.

Salesforce
Data Cloud Consultant
Salesforce Data Cloud
Recommended from ReadMedium