avatarSarit Maitra

Summary

The web content presents an integrated approach using RFM (Recency, Frequency, Monetary) analysis, clustering, Customer Lifetime Value (CLTV), and Machine Learning (ML) algorithms to forecast consumer purchasing behavior and segment customers for targeted marketing strategies.

Abstract

The article discusses a comprehensive method for predicting customer behavior by combining traditional RFM marketing analysis with advanced machine learning techniques. It emphasizes the limitations of statistical methods and advocates for the use of ML algorithms to enhance customer segmentation and CLTV prediction. The approach involves calculating RFM scores, applying K-means clustering to segment customers into value-based groups, and using these segments to train an ML model, such as XGB, to predict future customer value. The article also covers data mining, feature engineering, and model evaluation through metrics like accuracy, precision, recall, and ROCAUC. The goal is to provide businesses with actionable insights to improve customer acquisition, retention, loyalty, and profitability.

Opinions

  • The author suggests that traditional statistical methods are less effective for customer behavior prediction due to their stringent assumptions and inability to handle a large number of variables.
  • Machine learning methods are preferred as they do not rely on fixed-form equations and can handle the complex nature of real-world data.
  • The use of nonparametric methods is justified due to the non-normal distribution of the dataset, which is common in business data.
  • The article implies that deep learning algorithms could potentially improve the predictive model's performance, although they were not exercised in the case study.
  • The author expresses that understanding and addressing prediction errors is crucial for building effective predictive systems.
  • The author provides a link to their LinkedIn profile, indicating a willingness to engage with readers and potentially offer further insights or assistance.
  • A cautionary note is included, stating that the methods described are experimental and should be applied with caution, emphasizing the user's responsibility when implementing these techniques.

Integrated Approach of RFM, Clustering, CLTV & Machine Learning Algorithms for Forecasting

A case study with Python code

Image by author

CLTV is a customer relationship management (CRM) issue with an enterprise approach to understanding and influencing customer behavior through meaningful communication to improve customer acquisition, customer retention, customer loyalty, and customer profitability. The whole idea is that, business wants to predict the average amount of $$ customers will spend on the business over the entire life of relationship.

Although statistical methods can be very powerful, but these methods make several stringent assumptions on the types of data and their distribution, and typically can only handle a limited number of variables. Regression-based methods are usually based on a fixed-form equation, and assume a single best solution, which means that we can compare only a few alternative solutions manually. Further, when the models are applied to real data, the key assumptions of the methods are often violated. Here, I will show Machine Learning (ML) methods by integrating the CLTV and customer transaction variables with the RFM variables to forecast consumer purchases.

I will use two approaches here —

1st approach- RFM (Recency, Frequency, and Monetary) marketing analysis method is used in order to segmentation of customers and

2nd approach using Customer Lifetime Value (CLTV) will train a ML algorithm for prediction. I will use 3 months of data to calculate RFM and use it for predicting next 6 months.

RFM is a scoring model attempt to predict customers’ behavior in the future and implicitly linked to CLTV. One key limitation of RFM models is that they are scoring models and do not explicitly provide a $ number for customer value. A simple equation to derive CLTV for a customer

  • pt= price paid by a consumer at time t,
  • ct = direct cost of servicing the customer at time t,
  • i = discount rate or cost of capital for the firm,
  • rt = probability of customer repeat buying or being “alive” at time t,
  • AC = acquisition cost, and
  • T = time horizon for estimating CLTV.

Data Mining

Let’s load and see the data.

We have all the necessary information that we need:

  • Customer ID
  • Unit Price
  • Quantity
  • Invoice Date

With all these features, we can build the equation for Monetary value= Active Customer Count * Order Count * Average Revenue per Order

df[‘InvoiceDate’] = pd.to_datetime(df[‘InvoiceDate’]) #convert the type of Invoice Date Field from string to datetime.
df[‘InvoiceYearMonth’] = df[‘InvoiceDate’].map(lambda date: 100*date.year + date.month) #create YearMonth field
df[‘Monetary’] = df[‘UnitPrice’] * df[‘Quantity’] #calculate Monetary for each row and create a new data frame with YearMonth — Monetary columns
monetary = df.groupby([‘InvoiceYearMonth’])[‘Monetary’].sum().reset_index()

Before we dive into RFM score, we can do some analysis to know more about customer behavior such as Monthly Active Customers/ Monthly Order Count/Average Revenue per Order /New Customer Ratio/ Monthly Customer Retention Rate etc. Interested may visit here to know about the such analysis. So, I will start with segmentation.

Customer Segmentation

Let’s assume some common segments-

  • Low Value- Customers who are less active than others, not very frequent buyer/visitor and generates very low — zero — maybe negative revenue.
  • Mid Value- Customers who are fairly frequent and generates moderate revenue.
  • High Value- Customers with High Revenue, Frequency and low Inactivity; business always want to retain these customers.

We shall calculate RFM Value and apply unsupervised ML to identify different clusters for each by applyting K-means clustering to assign a recency score. Number of clusters generally defined by business, we need to K-means algorithm. However, Elbow Method of K-means helps us to know the optimal cluster number.

Recency

To calculate recency, we need to find out most recent purchase date of each customer and see how many days they are inactive for. After having no. of inactive days for each customer, we will apply K-means clustering to assign customers a recency score.

Here it looks we have 3 clusters. Based on business requirements, we can go with less or more clusters. Let us select 4 for this example:

Likewise, we can do Frequency and Monetary and finally the Overall Score.

We divide these cluster in High/Mid/Low — 0 to 2- Low / 3 to 4- Value / 5+- High Value customers

The descriptive statistics of the respective RFM is show below—

We see that even though the average is 90 day recency, median is 49. Negative Monetary value at min indicating return of items. The test statistic values and below distribution & QQ plots confirm that data set do not follow a normal distribution. Therefore, the use of nonparametric framework for making predictions is justified.

Evidences from the statistical tests imply that data characterized by their nonparametric nature behavior. This justifies the deployment of advanced ML and deep learning algorithms for predictive modeling exercise. However, I have not exercised deep learning algorithm here.

We can start taking actions with this segmentation. The strategies are simple for all three classes:

  • Improve retention of High Value customer
  • Improve retention and increase frequency of Mid Value customer
  • Increase Frequency of Low Value customer

Customer Lifetime Value (CLTV)

CLTV is quite simple here. First we will select a time window anything from 3, 6, 12, or 24 months. We can have compute the CLTV for each customer in that specific time window with an equation: Total Gross Revenue -Total Cost. This equation based on historical data and gives us the historical value. If we see some customers having very high negative lifetime value historically then probably we are too late to take an action. Let’s use ML algorithm to predict.

CLTV Prediction

So, let’s follow the steps-

  • Define an appropriate time frame for CLTV calculation
  • Identify the features we are going to use to predict future and create them
  • Calculate CLTV for training the ML model
  • Build and run the ML model
  • Check if the model is useful

We already have obtained the RFM scores for each customer ID. To implement it correctly, let’s split our dataset. I will take 3 months of data, calculate RFM and use it for predicting next 6 months.

#create 3m and 6m dataframes m3 = DF_uk[(DF_uk.InvoiceDate < date(2011,6,1)) & (DF_uk.InvoiceDate >= date(2011,3,1))].reset_index(drop=True) m6 = DF_uk[(DF_uk.InvoiceDate >= date(2011,6,1)) & (DF_uk.InvoiceDate < date(2011,12,1))].reset_index(drop=True)

Now, the similar process of clustering, computing RFM and overall scoring of each data frame and finally merging the 3 months and 6 months data frames to see correlations between CLTV and the feature set we have.

Here, by applying K-means clustering, we can identify the existing CLTV groups and build segments on top of it. Considering business part of this analysis, we need to treat customers differently based on their predicted CLTV. For this example, we will apply clustering and have 3 segments (number of segments really depends on your business dynamics and goals): - Low CLTV - Mid CLTV - High CLTV

We are going to apply K-means clustering to decide segments and observe their characteristics:

2 is the best with average 8.2k CLTV whereas 0 is the worst with 396. There are few more step before training the ML model:

  • Need to do some feature engineering. We should convert categorical columns to numerical columns.
  • We will check the correlation of features against our label, CLTV clusters.
  • We will split our feature set and label (CLTV) as X and y. We use X to predict y.
  • Will create Training and Test dataset. Training set will be used for building the ML model.

We will apply our model to Test set to see its real performance.

from sklearn.model_selection import KFold, cross_val_score, train_test_split
#convert categorical columns to numerical
DF_class = pd.get_dummies(DF_cluster)
#calculate and show correlations
corr_matrix = DF_class.corr()
corr_matrix[‘LTVCluster’].sort_values(ascending=False)
#create X and y, X will be feature set and y is the label — LTV
X = DF_class.drop([‘LTVCluster’,’m6_Monetary’],axis=1)
y = DF_class[‘LTVCluster’]
#split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=42)

We see that 3 months Revenue, Frequency and RFM scores will be helpful for our ML models. With the training and test sets we can build our model.

Machine Learning Algorithm comparison

Predictive model based on ML algorithms are kind of black box models which can be opened by using Sensitivity & Specificity analysis.

FP = confusion_matrix.sum(axis=0) — np.diag(confusion_matrix)

FN = confusion_matrix.sum(axis=1) — np.diag(confusion_matrix)

TP = np.diag(confusion_matrix)

TN = confusion_matrix.values.sum() — (FP + FN + TP)

TPR = TP/(TP+FN) # Sensitivity, hit rate, recall, or true positive rate

TNR = TN/(TN+FP) # Specificity or true negative rate

PPV = TP/(TP+FP) # Precision or positive predictive value

NPV = TN/(TN+FN) # Negative predictive value

FPR = FP/(FP+TN) # Fall out or false positive rate

FNR = FN/(TP+FN) # False negative rate

FDR = FP/(TP+FP)# False discovery rate

ACC = (TP+TN)/(TP+FP+FN+TN) # Overall accuracy

XGB model

We have a multi classification model with 3 groups (clusters). Accuracy shows 78% on the test set. Our True positives are on the diagonal axis and are the largest numbers here. The False Negatives are the sum of the other values along the rows. The False Positives are the sum of the other values down the columns. Precision and recall are acceptable for 0. For cluster 0 which is Low CLTV, if model identifies customer belongs to cluster 0, 85% chance that it will be correct(precision).The classifier successfully identifies 90% of actual cluster 0 customers (recall). We need to improve the model for other clusters. The classifier barely detect 43% of Mid CLTV customers.

Let’s experiment changing the depth and OneVsRestClassifier —

Some improvement can be seen here. However, there are still rooms for improvement e.g.

  • Adding more features and improve feature engineering
  • Try ANN /DNN

ROCAUC

By default with multi-class ROCAUC visualizations, a curve for each class is plotted, in addition to the micro- and macro-average curves for each class. This enables the user to inspect the tradeoff between sensitivity and specificity on a per-class basis.

Class Prediction Error

Understanding prediction errors and determining how to fix them is critical to building effective predictive systems. If you are interested, I will recommend to read this article to know more about prediction errors.

Summary

In ML models parameters are tuned/estimated based on the data and the parameters control how the algorithms learn from the data (without making any assumptions about the data, and downstream of the data generation). XGB is a tree based algorithm and hence can be considered nonparametric. The tree depth used here is a parameter of the algorithm, but it is not inherently derived from the data, but rather an input parameter.

I can be reached here.

Notice: The programs described here are experimental and should be used with caution. All such use at your own risk.

Machine Learning
Customer Service
Data Science
Analytics
Recommended from ReadMedium