Explainable T-learner Deep Learning Uplift Model Using Python Package CausalML

T-learner uplift models using XGBoost, lightGBM, and neural network model with feature importance and model interpretation

T-learner is a meta-learner that uses two machine learning models to estimate the individual-level heterogeneous causal treatment effect. In this tutorial, we will talk about how to use the python package causalML to build a T-learner. We will cover:

How to implement T-learner using the XGBoost model, the light GBM model, and the neural network model separately?
How to make individual treatment effect (ITE) and average treatment effect (ATE) estimations using a T-learner?
How to check the T-learner feature importance?
How to interpret a T-learner uplift model using SHAP?

If you are interested in building a T-learner manually, please check out my previous tutorial T Learner Uplift Model for Individual Treatment Effect (ITE) in Python.

Resources for this post:

Video tutorial for this post on YouTube
Click here for the Colab notebook.
More video tutorials on Uplift Modeling
More blog posts on Uplift Modeling
If you are not a Medium member and want to support me to keep providing free content (😄 Buy me a cup of coffee ☕), join Medium membership through this link. You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support 🙏
Give me a tip to show your appreciation and help me keep providing free content. Thank you for your generosity 🙏

Let’s get started!

Step 1: Install and Import Libraries

In step 1, we will install and import the python libraries.

Firstly, let’s install causalml .

# Install package
!pip install causalml

After the installation is completed, we can import the libraries.

pandas and numpy are imported for data processing.
synthetic_data is imported for synthetic data creation.
XGBTRegressor, MLPTRegressor, BaseTRegressor, LGBMRegressor and XGBRegressor are for the machine learning model training.

# Data processing
import pandas as pd
import numpy as np

# Create synthetic data
from causalml.dataset import synthetic_data

# Machine learning model
from causalml.inference.meta import XGBTRegressor, MLPTRegressor, BaseTRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…

medium.com

Step 2: Create Dataset

In step 2, we will create a synthetic dataset for the T-learner uplift model.

Firstly, a random seed is set to make the synthetic dataset reproducible.

Then, using the synthetic_data method from the causalml python package, we created a dataset with five features, one treatment variable, and one continuous outcome variable.

# Set a seed for reproducibility
np.random.seed(42)

# Create a synthetic dataset
y, X, treatment, ite, _, _ = synthetic_data(mode=1, n=5000, p=5, sigma=1.0)

feature_names = ['X1', 'X2', 'X3', 'X4', 'X5']

After that, using value_counts on the treatment variable, we can see that out of 5000 samples, 2582 units received treatment and 2418 units did not receive treatment.

# Check treatment vs. control counts
pd.Series(treatment).value_counts()

Output:

1    2582
0    2418
dtype: int64

Finally, we get the true average treatment effect (ATE) by taking the mean of the true individual treatment effect (ITE). The true average treatment effect (ATE) is about 0.5.

# True ATE
ite.mean()

Output:

0.4988477022092744

Step 3: T-learner Using XGBoost Model

In step 3, we will use the XGBoost model with T-learner to estimate the average treatment effect (ATE) and the individual treatment effect (ITE).

XGBTRegressor is a built-in XGBoost T-learner model that comes with the causalML package.

To estimate the average treatment effect (ATE) using XGBTRegressor, we first initiate the XGBTRegressor, then get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate method.

# Use XGBTRegressor
xgb = XGBTRegressor(random_state=42)

# Estimated ATE, upper bound, and lower bound
te, lb, ub = xgb.estimate_ate(X, treatment, y)

# Print out results
print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

We can see that the estimated average treatment effect (ATE) is 0.61, which is 0.11 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.56 and the upper bound for the average treatment effect (ATE) is 0.67.

Average Treatment Effect: 0.61 (0.56, 0.67)

The method fit_predict produces the estimated individual treatment effect (ITE).

If the confidence interval for the individual treatment effect (ITE) is needed, we can use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True.

The output gives us both the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.

# ITE
xgb_ite = xgb.fit_predict(X, treatment, y)

# ITE with confidence interval
xgb_ite, xgb_ite_lb, xgb_ite_ub = xgb.fit_predict(X=X, treatment=treatment, y=y, return_ci=True,
                               n_bootstraps=100, bootstrap_size=500)

Step 4: T-learner Using Light GBM Model

In step 4, we will talk about how to use BaseTRegressor with a light GBM model for the T-learner.

BaseTRegressor is a generalized method that can take in existing machine learning models from packages such as sklearn and xgboost, and run T-learners with those models. In this step, we will run the BaseTRegressor with LGBMRegressor.

If we run BaseTRegressor with xgboost, the result is the same as the XGBTRegressor that comes with the causalml python package.

To estimate the average treatment effect (ATE) using BaseTRegressor, we first initiate the BaseTRegressor with the LGBMRegressor, then get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate method.

# Use LGBMRegressor with BaseSRegressor
lgbm = BaseTRegressor(LGBMRegressor(random_state=42))

# Estimated ATE, upper bound, and lower bound
te, lb, ub = lgbm.estimate_ate(X, treatment, y)

# Print out results
print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.54 and the upper bound for the average treatment effect (ATE) is 0.62. The confidence interval range is smaller than the confidence interval from the built-in XGBTRegressor.

Average Treatment Effect: 0.58 (0.54, 0.62)

The results show that using the BaseSRegressor in combination with LGBMRegressor produced a better estimation for the average treatment effect (ATE) than the built-in XGBTRegressor.

To estimate the individual treatment effect (ITE), we use the method fit_predict on the light GBM model.

We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.

# ITE
lgbm_ite = lgbm.fit_predict(X, treatment, y)

# ITE with confidence interval
lgbm_ite, lgbm_ite_lb, lgbm_ite_ub = lgbm.fit_predict(X=X, treatment=treatment, y=y, return_ci=True,
                               n_bootstraps=100, bootstrap_size=500)

Step 5: T-learner Using Neural Network Model

In step 5, we will talk about how to use a neural network model with T-learner.

The python package causalml has a built-in function MLPTRegressor that runs the multilayer perceptron neural network models with T-learner.

hidden_layer_sizes specifies the number of hidden layers and the number of neurons in each layer. hidden_layer_sizes=(35, 25, 10, 5) means that there are four hidden layers for the neural network model. The first hidden layer has 35 neurons, the second hidden layer has 25 neurons, the third hidden layer has 10 neurons, and the fourth hidden layer has 5 neurons.
learning_rate_init specifies the initial learning rate of the neural network model. We set the initial value to 0.01.
early_stopping=True means the neural network model stops training if the model loss does not improve.
random_state gives us reproducible results.

After initiating the neural network model using MLPTRegressor, we gave it a name nn and run estimate_ate on it to get the average treatment effect (ATE), and the upper and lower bound of the average treatment effect (ATE).

# Use MLPTRegressor with BaseSRegressor
nn = MLPTRegressor(hidden_layer_sizes=(35, 25, 10, 5),
                 learning_rate_init=.01,
                 early_stopping=True,
                 random_state=1)

# Estimated ATE, upper bound, and lower bound
te, lb, ub = nn.estimate_ate(X, treatment, y)

# Print out results
print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE), and the same as the light GBM results.

Average Treatment Effect: 0.58 (0.53, 0.64)

Tuning the hyperparameters such as the number of layers, the number of neurons in each layer, and the initial learning rate can potentially improve the model performance.

Calculating the individual treatment effect (ITE) for the neural network model is the same as other T-learner models. We use the method fit_predict on the neural network model to get the individual treatment effect (ITE).

We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.

# ITE
nn_ite = nn.fit_predict(X, treatment, y)

# ITE with confidence interval
nn_ite, nn_ite_lb, nn_ite_ub = nn.fit_predict(X=X, treatment=treatment, y=y, return_ci=True,
                               n_bootstraps=100, bootstrap_size=500)

Step 6: T-learner Neural Network Model Feature Importance

In step 6, we will talk about how to get feature importance for T-learner.

The syntax for getting the feature importance is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.

The feature importance is calculated by building a new machine learning model on the backend, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.

get_importance is the function to get the feature importance values.
X takes in the feature matrix.
tau takes in the individual treatment effect (ITE).
features=feature_names prints out the feature names in the outputs.
random_state makes the results reproducible.
method specifies whether to use auto or permutation for the feature importance calculation. auto works on a tree-based estimator. It uses the estimator's default feature importance. If no tree-based estimator is provided, it falls back to the LGBMRegressor and gain as the importance type. permutation works on any estimator. It permutes a feature column and calculates the decrease in accuracy. The feature importance is ordered based on the magnitude of the decrease in accuracy. When the sample size is large, downsampling is suggested.

# Feature importance using permutation
nn.get_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)

From the output, we can see that X2 is the most important feature, X5 is the least important feature.

{1: X2    0.975168
 X1    0.812584
 X3    0.215354
 X4    0.097804
 X5    0.055254
 dtype: float64}

We can also visualize the feature importance using the plot_importance function.

# Visualization
nn.plot_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)

Step 7: T-learner Neural Network Model Interpretation

In step 7, we will interpret the T-learner model using SHAP (SHapley Additive exPlanations).

The syntax for SHAP interpretation is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.

The sharpley values are calculated based on a machine learning model, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.

plot_shap_values is the function to visualize SHAP values.
X takes in the feature matrix.
tau takes in the individual treatment effect (ITE).
features=feature_names prints out the feature names in the outputs.

# Plot shap values
nn.plot_shap_values(X=X, tau=nn_ite, features=feature_names)

The SHAP plot includes both the feature importance and the feature impacts.

The y-axis is the list of features ordered from the most important to the least important.
The x-axis is the SHAP value, representing how each feature impacts the model output.
The color of the dots represents the feature values. Blue indicates low values, and red indicates high values.
The overlapping dots are jittered, which helps us to see the distribution of each feature.

For example, from the SHAP plot we can see that X2 is the most important feature. High X2 values affect the predictions in a positive direction and low X2 values affect the predictions in a negative direction. Most samples have high X2 values.

More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com.

References

Künzel, Sören R., et al. “Metalearners for estimating heterogeneous treatment effects using machine learning.” Proceedings of the national academy of sciences 116.10 (2019): 4156–4165.
CausalML documentation

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…

medium.com

Explainable T-learner Deep Learning Uplift Model Using Python Package CausalML

Step 1: Install and Import Libraries

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…

Step 2: Create Dataset

Step 3: T-learner Using XGBoost Model

Step 4: T-learner Using Light GBM Model

Step 5: T-learner Using Neural Network Model

Step 6: T-learner Neural Network Model Feature Importance

Step 7: T-learner Neural Network Model Interpretation

Recommended Tutorials

References

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…