Integrated Approach of RFM, Clustering, CLTV & Machine Learning Algorithms for Forecasting
A case study with Python code
CLTV is a customer relationship management (CRM) issue with an enterprise approach to understanding and influencing customer behavior through meaningful communication to improve customer acquisition, customer retention, customer loyalty, and customer profitability. The whole idea is that, business wants to predict the average amount of $$ customers will spend on the business over the entire life of relationship.
Although statistical methods can be very powerful, but these methods make several stringent assumptions on the types of data and their distribution, and typically can only handle a limited number of variables. Regression-based methods are usually based on a fixed-form equation, and assume a single best solution, which means that we can compare only a few alternative solutions manually. Further, when the models are applied to real data, the key assumptions of the methods are often violated. Here, I will show Machine Learning (ML) methods by integrating the CLTV and customer transaction variables with the RFM variables to forecast consumer purchases.
I will use two approaches here —
1st approach- RFM (Recency, Frequency, and Monetary) marketing analysis method is used in order to segmentation of customers and
2nd approach using Customer Lifetime Value (CLTV) will train a ML algorithm for prediction. I will use 3 months of data to calculate RFM and use it for predicting next 6 months.
RFM is a scoring model attempt to predict customers’ behavior in the future and implicitly linked to CLTV. One key limitation of RFM models is that they are scoring models and do not explicitly provide a $ number for customer value. A simple equation to derive CLTV for a customer
- pt= price paid by a consumer at time t,
- ct = direct cost of servicing the customer at time t,
- i = discount rate or cost of capital for the firm,
- rt = probability of customer repeat buying or being “alive” at time t,
- AC = acquisition cost, and
- T = time horizon for estimating CLTV.
Data Mining
Let’s load and see the data.
We have all the necessary information that we need:
- Customer ID
- Unit Price
- Quantity
- Invoice Date
With all these features, we can build the equation for Monetary value= Active Customer Count * Order Count * Average Revenue per Order
df[‘InvoiceDate’] = pd.to_datetime(df[‘InvoiceDate’]) #convert the type of Invoice Date Field from string to datetime.
df[‘InvoiceYearMonth’] = df[‘InvoiceDate’].map(lambda date: 100*date.year + date.month) #create YearMonth field
df[‘Monetary’] = df[‘UnitPrice’] * df[‘Quantity’] #calculate Monetary for each row and create a new data frame with YearMonth — Monetary columns
monetary = df.groupby([‘InvoiceYearMonth’])[‘Monetary’].sum().reset_index()
Before we dive into RFM score, we can do some analysis to know more about customer behavior such as Monthly Active Customers/ Monthly Order Count/Average Revenue per Order /New Customer Ratio/ Monthly Customer Retention Rate etc. Interested may visit here to know about the such analysis. So, I will start with segmentation.
Customer Segmentation
Let’s assume some common segments-
- Low Value- Customers who are less active than others, not very frequent buyer/visitor and generates very low — zero — maybe negative revenue.
- Mid Value- Customers who are fairly frequent and generates moderate revenue.
- High Value- Customers with High Revenue, Frequency and low Inactivity; business always want to retain these customers.
We shall calculate RFM Value and apply unsupervised ML to identify different clusters for each by applyting K-means clustering to assign a recency score. Number of clusters generally defined by business, we need to K-means algorithm. However, Elbow Method of K-means helps us to know the optimal cluster number.
Recency
To calculate recency, we need to find out most recent purchase date of each customer and see how many days they are inactive for. After having no. of inactive days for each customer, we will apply K-means clustering to assign customers a recency score.
Here it looks we have 3 clusters. Based on business requirements, we can go with less or more clusters. Let us select 4 for this example:
Likewise, we can do Frequency and Monetary and finally the Overall Score.
We divide these cluster in High/Mid/Low — 0 to 2- Low / 3 to 4- Value / 5+- High Value customers
The descriptive statistics of the respective RFM is show below—
We see that even though the average is 90 day recency, median is 49. Negative Monetary value at min indicating return of items. The test statistic values and below distribution & QQ plots confirm that data set do not follow a normal distribution. Therefore, the use of nonparametric framework for making predictions is justified.
Evidences from the statistical tests imply that data characterized by their nonparametric nature behavior. This justifies the deployment of advanced ML and deep learning algorithms for predictive modeling exercise. However, I have not exercised deep learning algorithm here.
We can start taking actions with this segmentation. The strategies are simple for all three classes:
- Improve retention of High Value customer
- Improve retention and increase frequency of Mid Value customer
- Increase Frequency of Low Value customer
Customer Lifetime Value (CLTV)
CLTV is quite simple here. First we will select a time window anything from 3, 6, 12, or 24 months. We can have compute the CLTV for each customer in that specific time window with an equation: Total Gross Revenue -Total Cost. This equation based on historical data and gives us the historical value. If we see some customers having very high negative lifetime value historically then probably we are too late to take an action. Let’s use ML algorithm to predict.
CLTV Prediction
So, let’s follow the steps-
- Define an appropriate time frame for CLTV calculation
- Identify the features we are going to use to predict future and create them
- Calculate CLTV for training the ML model
- Build and run the ML model
- Check if the model is useful
We already have obtained the RFM scores for each customer ID. To implement it correctly, let’s split our dataset. I will take 3 months of data, calculate RFM and use it for predicting next 6 months.
#create 3m and 6m dataframes m3 = DF_uk[(DF_uk.InvoiceDate < date(2011,6,1)) & (DF_uk.InvoiceDate >= date(2011,3,1))].reset_index(drop=True) m6 = DF_uk[(DF_uk.InvoiceDate >= date(2011,6,1)) & (DF_uk.InvoiceDate < date(2011,12,1))].reset_index(drop=True)
Now, the similar process of clustering, computing RFM and overall scoring of each data frame and finally merging the 3 months and 6 months data frames to see correlations between CLTV and the feature set we have.
Here, by applying K-means clustering, we can identify the existing CLTV groups and build segments on top of it. Considering business part of this analysis, we need to treat customers differently based on their predicted CLTV. For this example, we will apply clustering and have 3 segments (number of segments really depends on your business dynamics and goals): - Low CLTV - Mid CLTV - High CLTV
We are going to apply K-means clustering to decide segments and observe their characteristics:
2 is the best with average 8.2k CLTV whereas 0 is the worst with 396. There are few more step before training the ML model:
- Need to do some feature engineering. We should convert categorical columns to numerical columns.
- We will check the correlation of features against our label, CLTV clusters.
- We will split our feature set and label (CLTV) as X and y. We use X to predict y.
- Will create Training and Test dataset. Training set will be used for building the ML model.
We will apply our model to Test set to see its real performance.
from sklearn.model_selection import KFold, cross_val_score, train_test_split
#convert categorical columns to numerical
DF_class = pd.get_dummies(DF_cluster)
#calculate and show correlations
corr_matrix = DF_class.corr()
corr_matrix[‘LTVCluster’].sort_values(ascending=False)
#create X and y, X will be feature set and y is the label — LTV
X = DF_class.drop([‘LTVCluster’,’m6_Monetary’],axis=1)
y = DF_class[‘LTVCluster’]
#split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=42)
We see that 3 months Revenue, Frequency and RFM scores will be helpful for our ML models. With the training and test sets we can build our model.
Machine Learning Algorithm comparison
Predictive model based on ML algorithms are kind of black box models which can be opened by using Sensitivity & Specificity analysis.
FP = confusion_matrix.sum(axis=0) — np.diag(confusion_matrix)
FN = confusion_matrix.sum(axis=1) — np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() — (FP + FN + TP)
TPR = TP/(TP+FN) # Sensitivity, hit rate, recall, or true positive rate
TNR = TN/(TN+FP) # Specificity or true negative rate
PPV = TP/(TP+FP) # Precision or positive predictive value
NPV = TN/(TN+FN) # Negative predictive value
FPR = FP/(FP+TN) # Fall out or false positive rate
FNR = FN/(TP+FN) # False negative rate
FDR = FP/(TP+FP)# False discovery rate
ACC = (TP+TN)/(TP+FP+FN+TN) # Overall accuracy
XGB model
We have a multi classification model with 3 groups (clusters). Accuracy shows 78% on the test set. Our True positives are on the diagonal axis and are the largest numbers here. The False Negatives are the sum of the other values along the rows. The False Positives are the sum of the other values down the columns. Precision and recall are acceptable for 0. For cluster 0 which is Low CLTV, if model identifies customer belongs to cluster 0, 85% chance that it will be correct(precision).The classifier successfully identifies 90% of actual cluster 0 customers (recall). We need to improve the model for other clusters. The classifier barely detect 43% of Mid CLTV customers.
Let’s experiment changing the depth and OneVsRestClassifier —
Some improvement can be seen here. However, there are still rooms for improvement e.g.
- Adding more features and improve feature engineering
- Try ANN /DNN
ROCAUC
By default with multi-class ROCAUC visualizations, a curve for each class is plotted, in addition to the micro- and macro-average curves for each class. This enables the user to inspect the tradeoff between sensitivity and specificity on a per-class basis.
Class Prediction Error
Understanding prediction errors and determining how to fix them is critical to building effective predictive systems. If you are interested, I will recommend to read this article to know more about prediction errors.
Summary
In ML models parameters are tuned/estimated based on the data and the parameters control how the algorithms learn from the data (without making any assumptions about the data, and downstream of the data generation). XGB is a tree based algorithm and hence can be considered nonparametric. The tree depth used here is a parameter of the algorithm, but it is not inherently derived from the data, but rather an input parameter.
I can be reached here.
Notice: The programs described here are experimental and should be used with caution. All such use at your own risk.