Chapter 9 — Data Cloud Case Study: Building A Data Transform To View Records

We saw in Chapter 8, that you can build a Data Transform to view the records from different related objects allowing you to join, aggreage, transform, and filter those objects, but we left the actual building to this chapter.
You can follow the case study here:
Table Of Contents
What Is A Data Transform? — Data Transform Canvas — Data Transform Preview Creating a Batch Data Transform — Adding Input Data Joining Objects — Joining Fans And Sales Transactions — Join Types — Right Table Prefix — Joining Products And Sales Transactions Transforming the Data — Formulas Aggregating Data — Aggregates — Group By Columns — Group By Columns And Rows Save It! — Download / Upload Metadata Create Output — Create CountPurchasesByFan DLO — Create output Node Save it (Again)! Run It! View Data Transform Metadata View Data Transform Results Adding DLO To Data Space View Data Transform Results (Again)
What Is A Data Transform?
A Data Transform is a repeatable series of operations that can be run when the data updates, these operations include: joins, transforms, filters, aggregates, and writing the results to a new DLO or DMO. You can run these Data Transformations ad-hoc or have them scheduled.

There are two types of data transforms: batch data transforms and streaming data transforms.
- Batch Data Transforms: Using a graphical tool, you can build a Data Transform that runs ad-hoc or scheduled. This is what we are going to create in this chapter.
- Streaming Data Transforms: Using SQL, you can create a Data Transform that runs continuously as a streaming process, picking up new or changed data as it arrives, and writing the output to a new DLO.
Data Transform Canvas
Before we get started working on the canvas, let’s take a look at the main interface

Data Transform Preview
One interesting feature of the Batch Data Transformation editor is that it shows you at every step the state of some sample data. For example, if I clicked on the node named “Very Active Fans” in the picture above, I would see this at the bottom of the editor:

This is precisely what we are after in this chapter, we want to see this data after joining the fans, products, and sales transactions data. So let’s review how this is done.
Creating a Batch Data Transform
The video shows the steps to create a Data Transform

Notice that you can select the type of objects early in the process of creating the Data Transform. You can choose either a DLO or a DMO. This not only affects which objects you can read the data from, but also which objects you can create the data out to. You can’t create a Data Transform on the DSO, but as we have talked about before, the DLO is the DSO plus some additional fields.
Adding Input Data
The next step is to define the input data

Notice that you can only select those objects that are DLO or DMO based on the selection that you did before.
Also, note that at this step you can define the fields that you care about in your Data Transform, but you can add/remove fields later inside the tool.

Notice, that this will only read some records, not the entire data set. In the image above, I am sampling the first 2000 records, but you can configure this as well with a maximum of 10000 records. If you need to process the entire data set, you will need to create a DLO for that, which is something we will be doing later in this chapter.
Joining Objects
Now that we have defined the input data and that we are bringing three DLOs (fans, products, and sales transactions), it’s time to start joining those tables.
Joining Fans And Sales Transactions
We’ll start with the Fans and Sales Transactions

The options to configure the join are complex, so let’s review them

The DLOs that are joining are listed at the top of this panel. The first object (on the top) is the left table in the relationship. The second object (on the bottom) is the right table in the relationship.
When dragging the node to create a join, start from the object that will be the right table, and drag it into the object that will be the left table in the relationship. This is especially true if you are not doing an inner join.
Join Types
Below we have the type of joins you want to create, in our case, we are doing an inner join. Let’s review the different types of joins we can have:
- A Lookup Join includes all rows from the left table data and only matching rows from the right table. If multiple matches are found on the right table, it returns only one matching row. We’ll look into this a bit deeper soon.
- A Left Join includes all rows from the left table and only the matching rows from the right table. If there is no match, the right table will have null values.
- A Right Join includes all rows from the right table and only the matching rows from the left table. If there is no match, the left table will have null values
- An Inner Join returns only the rows that have matching keys in both the left and right tables.
- An Outer Join returns all the rows from both tables, joining records from the left and right tables. It includes the results of both the left and right joins. If there is no match, the non-matching side will have null values
- A Cross Join produces the Cartesian product of the two tables, resulting in all possible combinations of rows. It is used when there is no common column between the tables, but it can generate a large result set and is rarely used and it should be avoided.
I find the Lookup Join a bit confusing, so let’s try an example. Let’s select the Sales Transactions as the left table, and Fans as the right table. We do have a FanId on the Sales Transactions table, but we do not have the personal information (first name, last name, email), we could look for that information on the Fans table.

Right Table Prefix
The last part of this screen that I want to point out, is the prefixes for the fields on the right table. This can become very important with complex data transformations.
In the previous example, I changed the default value Fans to FansLookup to make the next screen more obvious.

On the same screen note that you can indicate which fields you want to use in the next steps, try to limit the columns coming out of a node.
Joining Products And Sales Transactions
Using what we just learned, joining the Products and Sales Transactions objects should be a walk in the park (Assuming is not -40C in the middle of winter).

Transforming the Data
Use transformations to perform calculations on and manipulate your data. With transformations, among other things, you could:
- Drop, rename, or edit the properties of a field
- Create formula fields with values from a row, or a group of rows (first value, last value).
- Create buckets to add a new field with the name of the bucket determined from the value of a field.
In this video, we see how we are dropping columns using a transformation.

In the previous video, I wanted to show how you could work with transformations, although you are better off cleaning the fields on imported tables or the join nodes.
Formulas
This node is extremely useful for creating fields based on formulas, for example, we could concatenate the first and last name into a single field:

Aggregating Data
The next step is aggregating the data because we want to count how many times a fan has purchased products from different categories based on the sales transactions. We want to do something like this, sometimes code is easier than English 😀:
SELECT Count() FROM SalesTransactions GROUP BY FanId, Category
Let’s see how that is done

Aggregates
First, you select the aggregates which Data Transformations offer a great range of options

Group By Columns
Then you select the fields you want to use for grouping your data, in our case, we choose FanId and Category.

Group By Columns And Rows
You could also group by both rows and columns. In this case, we are grouping by FanId (rows) and Categories (Columns).

Save It!
I have done enough work, let’s save it before we lose any work. that would be horrible!

Wait, we are getting an error. This means that we can’t save it yet! This is the error message

This means that we need an output object before the Data Transform can be saved. It kind of makes sense, you create a Data Transform to transform data, and you need to store the transformed data. Without an output element, why are you creating a Data Transform?
Download / Upload Metadata
On the other hand, we can’t save our work even as a draft version. That is kind of annoying, but we have a way out in this case. You can download the metadata for the Data Transform as a JSON file, and you can also upload a version of the Data Transform metadata.

Create Output
Remember at the beginning of this article, yes I know… a long time back, that we had to choose if we were going to do a transformation on DLOs or DMOs? Well, we need to create an output object of the same type.
Create CountPurchasesByFan DLO
MAKE SURE YOU OPEN A NEW TAB, or you have downloaded the JSON file because otherwise, you could lose your hard work (it happened to a friend, right?😳).
Since we are building this data in the Fan, Products, Sales Transactions DLOs then we are going to create a new DLO.

Create output Node

You have to map the fields that you want to output

Save it (Again)!
The last time we tried to save, it complained because it was missing an output node. I have just provided one, so we should be OK now. Let’s save it, and close the editor.

Run It!
Now we are ready to run the data transformation, check out the video for the steps:

The video, also shows how you can check out if the process is done by clicking on the Update Status option.
View Data Transform Metadata
Clicking on the link in the list view above, we can see the Data Transform metadata, and perform some quick actions like scheduling, running ad-hock, or updating the status

View Data Transform Results
All right, the Data Transform has completed its execution, and the results have been written to the CountPurchasesByFan DLO. Let’s view the data in such DLO, we can use the Data Explorer tool, discussed in the previous chapter, to see the records in the DLO.

Adding DLO To Data Space
Did you notice the DLO is not listed? The reason for that is that we have to add the DLO to the Data Space. Any DSO/DLO that gets created when ingesting data is added to the Data Space, but if you add a new DLO (like we did before) then you need to add the DLO to the Data Space.

View Data Transform Results (Again)
Now that the CountPurchasesByFan DLO is in the Data Space, we should be able to view the data.
