Explainable T-learner Deep Learning Uplift Model Using Python Package CausalML
T-learner uplift models using XGBoost, lightGBM, and neural network model with feature importance and model interpretation
T-learner is a meta-learner that uses two machine learning models to estimate the individual-level heterogeneous causal treatment effect. In this tutorial, we will talk about how to use the python package causalML
to build a T-learner. We will cover:
- How to implement T-learner using the XGBoost model, the light GBM model, and the neural network model separately?
- How to make individual treatment effect (ITE) and average treatment effect (ATE) estimations using a T-learner?
- How to check the T-learner feature importance?
- How to interpret a T-learner uplift model using SHAP?
If you are interested in building a T-learner manually, please check out my previous tutorial T Learner Uplift Model for Individual Treatment Effect (ITE) in Python.
Resources for this post:
- Video tutorial for this post on YouTube
- Click here for the Colab notebook.
- More video tutorials on Uplift Modeling
- More blog posts on Uplift Modeling
- If you are not a Medium member and want to support me to keep providing free content (😄 Buy me a cup of coffee ☕), join Medium membership through this link. You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support 🙏
- Give me a tip to show your appreciation and help me keep providing free content. Thank you for your generosity 🙏
Let’s get started!
Step 1: Install and Import Libraries
In step 1, we will install and import the python libraries.
Firstly, let’s install causalml
.
# Install package
!pip install causalml
After the installation is completed, we can import the libraries.
pandas
andnumpy
are imported for data processing.synthetic_data
is imported for synthetic data creation.XGBTRegressor
,MLPTRegressor
,BaseTRegressor
,LGBMRegressor
andXGBRegressor
are for the machine learning model training.
# Data processing
import pandas as pd
import numpy as np
# Create synthetic data
from causalml.dataset import synthetic_data
# Machine learning model
from causalml.inference.meta import XGBTRegressor, MLPTRegressor, BaseTRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
Step 2: Create Dataset
In step 2, we will create a synthetic dataset for the T-learner uplift model.
Firstly, a random seed is set to make the synthetic dataset reproducible.
Then, using the synthetic_data
method from the causalml
python package, we created a dataset with five features, one treatment variable, and one continuous outcome variable.
# Set a seed for reproducibility
np.random.seed(42)
# Create a synthetic dataset
y, X, treatment, ite, _, _ = synthetic_data(mode=1, n=5000, p=5, sigma=1.0)
feature_names = ['X1', 'X2', 'X3', 'X4', 'X5']
After that, using value_counts
on the treatment
variable, we can see that out of 5000 samples, 2582 units received treatment and 2418 units did not receive treatment.
# Check treatment vs. control counts
pd.Series(treatment).value_counts()
Output:
1 2582
0 2418
dtype: int64
Finally, we get the true average treatment effect (ATE) by taking the mean of the true individual treatment effect (ITE). The true average treatment effect (ATE) is about 0.5.
# True ATE
ite.mean()
Output:
0.4988477022092744
Step 3: T-learner Using XGBoost Model
In step 3, we will use the XGBoost model with T-learner to estimate the average treatment effect (ATE) and the individual treatment effect (ITE).
XGBTRegressor
is a built-in XGBoost T-learner model that comes with the causalML
package.
To estimate the average treatment effect (ATE) using XGBTRegressor
, we first initiate the XGBTRegressor
, then get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate
method.
# Use XGBTRegressor
xgb = XGBTRegressor(random_state=42)
# Estimated ATE, upper bound, and lower bound
te, lb, ub = xgb.estimate_ate(X, treatment, y)
# Print out results
print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
We can see that the estimated average treatment effect (ATE) is 0.61, which is 0.11 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.56 and the upper bound for the average treatment effect (ATE) is 0.67.
Average Treatment Effect: 0.61 (0.56, 0.67)
The method fit_predict
produces the estimated individual treatment effect (ITE).
If the confidence interval for the individual treatment effect (ITE) is needed, we can use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True
.
The output gives us both the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
# ITE
xgb_ite = xgb.fit_predict(X, treatment, y)
# ITE with confidence interval
xgb_ite, xgb_ite_lb, xgb_ite_ub = xgb.fit_predict(X=X, treatment=treatment, y=y, return_ci=True,
n_bootstraps=100, bootstrap_size=500)
Step 4: T-learner Using Light GBM Model
In step 4, we will talk about how to use BaseTRegressor
with a light GBM model for the T-learner.
BaseTRegressor
is a generalized method that can take in existing machine learning models from packages such as sklearn
and xgboost
, and run T-learners with those models. In this step, we will run the BaseTRegressor
with LGBMRegressor
.
If we run BaseTRegressor
with xgboost
, the result is the same as the XGBTRegressor
that comes with the causalml
python package.
To estimate the average treatment effect (ATE) using BaseTRegressor
, we first initiate the BaseTRegressor
with the LGBMRegressor
, then get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate
method.
# Use LGBMRegressor with BaseSRegressor
lgbm = BaseTRegressor(LGBMRegressor(random_state=42))
# Estimated ATE, upper bound, and lower bound
te, lb, ub = lgbm.estimate_ate(X, treatment, y)
# Print out results
print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.54 and the upper bound for the average treatment effect (ATE) is 0.62. The confidence interval range is smaller than the confidence interval from the built-in XGBTRegressor
.
Average Treatment Effect: 0.58 (0.54, 0.62)
The results show that using the BaseSRegressor
in combination with LGBMRegressor produced a better estimation for the average treatment effect (ATE) than the built-in XGBTRegressor
.
To estimate the individual treatment effect (ITE), we use the method fit_predict
on the light GBM model.
We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True
to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
# ITE
lgbm_ite = lgbm.fit_predict(X, treatment, y)
# ITE with confidence interval
lgbm_ite, lgbm_ite_lb, lgbm_ite_ub = lgbm.fit_predict(X=X, treatment=treatment, y=y, return_ci=True,
n_bootstraps=100, bootstrap_size=500)
Step 5: T-learner Using Neural Network Model
In step 5, we will talk about how to use a neural network model with T-learner.
The python package causalml
has a built-in function MLPTRegressor
that runs the multilayer perceptron neural network models with T-learner.
hidden_layer_sizes
specifies the number of hidden layers and the number of neurons in each layer.hidden_layer_sizes=(35, 25, 10, 5)
means that there are four hidden layers for the neural network model. The first hidden layer has 35 neurons, the second hidden layer has 25 neurons, the third hidden layer has 10 neurons, and the fourth hidden layer has 5 neurons.learning_rate_init
specifies the initial learning rate of the neural network model. We set the initial value to 0.01.early_stopping=True
means the neural network model stops training if the model loss does not improve.random_state
gives us reproducible results.
After initiating the neural network model using MLPTRegressor
, we gave it a name nn
and run estimate_ate
on it to get the average treatment effect (ATE), and the upper and lower bound of the average treatment effect (ATE).
# Use MLPTRegressor with BaseSRegressor
nn = MLPTRegressor(hidden_layer_sizes=(35, 25, 10, 5),
learning_rate_init=.01,
early_stopping=True,
random_state=1)
# Estimated ATE, upper bound, and lower bound
te, lb, ub = nn.estimate_ate(X, treatment, y)
# Print out results
print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE), and the same as the light GBM results.
Average Treatment Effect: 0.58 (0.53, 0.64)
Tuning the hyperparameters such as the number of layers, the number of neurons in each layer, and the initial learning rate can potentially improve the model performance.
Calculating the individual treatment effect (ITE) for the neural network model is the same as other T-learner models. We use the method fit_predict
on the neural network model to get the individual treatment effect (ITE).
We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
# ITE
nn_ite = nn.fit_predict(X, treatment, y)
# ITE with confidence interval
nn_ite, nn_ite_lb, nn_ite_ub = nn.fit_predict(X=X, treatment=treatment, y=y, return_ci=True,
n_bootstraps=100, bootstrap_size=500)
Step 6: T-learner Neural Network Model Feature Importance
In step 6, we will talk about how to get feature importance for T-learner.
The syntax for getting the feature importance is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.
The feature importance is calculated by building a new machine learning model on the backend, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.
get_importance
is the function to get the feature importance values.X
takes in the feature matrix.tau
takes in the individual treatment effect (ITE).features=feature_names
prints out the feature names in the outputs.random_state
makes the results reproducible.method
specifies whether to useauto
orpermutation
for the feature importance calculation.auto
works on a tree-based estimator. It uses the estimator's default feature importance. If no tree-based estimator is provided, it falls back to theLGBMRegressor
andgain
as the importance type.permutation
works on any estimator. It permutes a feature column and calculates the decrease in accuracy. The feature importance is ordered based on the magnitude of the decrease in accuracy. When the sample size is large, downsampling is suggested.
# Feature importance using permutation
nn.get_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)
From the output, we can see that X2
is the most important feature, X5
is the least important feature.
{1: X2 0.975168
X1 0.812584
X3 0.215354
X4 0.097804
X5 0.055254
dtype: float64}
We can also visualize the feature importance using the plot_importance
function.
# Visualization
nn.plot_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)
Step 7: T-learner Neural Network Model Interpretation
In step 7, we will interpret the T-learner model using SHAP (SHapley Additive exPlanations).
The syntax for SHAP interpretation is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.
The sharpley values are calculated based on a machine learning model, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.
plot_shap_values
is the function to visualize SHAP values.X
takes in the feature matrix.tau
takes in the individual treatment effect (ITE).features=feature_names
prints out the feature names in the outputs.
# Plot shap values
nn.plot_shap_values(X=X, tau=nn_ite, features=feature_names)
The SHAP plot includes both the feature importance and the feature impacts.
- The y-axis is the list of features ordered from the most important to the least important.
- The x-axis is the SHAP value, representing how each feature impacts the model output.
- The color of the dots represents the feature values. Blue indicates low values, and red indicates high values.
- The overlapping dots are jittered, which helps us to see the distribution of each feature.
For example, from the SHAP plot we can see that X2
is the most important feature. High X2
values affect the predictions in a positive direction and low X2
values affect the predictions in a negative direction. Most samples have high X2
values.
More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com.