How to Build a Dataset to Predict Customer Churn
The brass tacks.
AI is wildly hyped in 2020, and every startup claims to use it. However, getting relevant and clean data is a basic pre-requisite to AI that many organizations haven’t ticked off.
Churn analysis is a powerful AI use-case, but you can’t build an accurate churn model if you don’t have sufficient, high-quality data to plug-in.
To clarify some points, churn is when a customer quits a service, and the goal of churn analysis is to effectively fight churn and increase customer retention, which can be done with product upgrades, one-on-one customer interactions, better pricing, more targeted user acquisition, and so on.
Here’s how to get the data you need to build an accurate churn model.
Building the Dataset
We want to predict churn. So, we need historical data where one column is churn. This is a binary classification problem, so the labels for the churn column should look like “Yes” or “No” (or “1” or “0”, or any other class labels).
If you have a monthly subscription service, each row could be a certain client in a certain month, and the other columns (besides churn) are attributes about that client, such as their tenure, selected add-ons, contract type, and so on.
Here’s a simple example from a fictional telecom company.

You might have this data in an Excel sheet, a CSV file, stored in a Redshift database, or somewhere else. It could also be in different places, and you’ll have to bring them together. For example, you might have the customerID field and contract type in one database, and the customerID field with the churn information in another database, which means you can merge these on the customerID field to create one dataset.
Building a Model
Creating a great dataset is the hard part. With no-code tools like Apteo, building a churn model is easy.
First, connect your dataset. Below, I simply drag-and-drop a CSV file of my churn data into the platform. Then, I head to the “Predictive Insights” tab and select “Churn” as my KPI. I leave the default settings as they are, and an automated machine learning model gets created in the background.
Now, I can see how different attributes impact churn, and I can predict whether a customer will churn by putting in data like their monthly charge and tenure.






