Explaining Your Model with Microsoft’s InterpretML

Model interpretability has become the main theme in the machine learning community. Many innovations have burgeoned. The InterpretML module, developed by a team in Microsoft Inc., offers prediction accuracy, and model interpretability, and aims at serving as a unified API. Its Github receives active updates. I have written a series of articles on model interpretability, including “Explain Your Model with the SHAP Values”, “Explain Any Models with the SHAP Values — Use the KernelExplainer”, “Explain Your Model with LIME”, “The SHAP with More Elegant Charts”, and “Creating Waterfall Plots for the SHAP Values for All Models”.
In this article, I am going to introduce a new method other than the SHAP. I will provide a gentle mathematical background and then show you how to interpret your model with InterpretML. If you want to do the hands-on practice first, You can jump to the modeling part, then come back to review the mathematical background.
The InterpretML Python Module
The InterpretML module was developed by a Microsoft team. The module leverages many libraries like Plotly, LIME, SHAP, and SALib, so is compatible with other modules. Based on the Generalized Additive Model (GAM) and GA2M algorithms, the Microsoft team developed an interpretability algorithm called the Explainable Boosting Machine (EBM). The development history is quite inspiring so I decided to briefly describe it.
Understand Generalized Additive Models
The Generalized Additive Models (GAMs) were invented by Trevor Hastie and Robert Tibshirani in 1986. The GAM is a powerful and yet simple technique, although it does not receive sufficient popularity like random forest or gradient boosting in the data science community. Let me highlight the idea of GAM:
- Relationships between the individual predictors and the dependent variable follow smooth patterns that can be linear or nonlinear. Figure (A) illustrates the relationship between x1 and y can be nonlinear.
- Additive: these smooth relationships can be estimated simultaneously and then added up.

In Figure (A) the E(Y) denotes the expected value. The link function g() links the expected value to the predictor variables. The function f() is called the smooth or nonparametric function. (Nonparametric means that the shape of predictor functions is solely determined by the data. In contrast, parametric means the shape of predictor functions is defined by a certain function and parameters.) When the function f() becomes linear, GAM reduces to GLM. GLM is easy to interpret, and so is GAM.
If GAMs use smooth functions to fit data, will they fit too well by specifying high-degree smooth functions? How do Trevor Hastie and Robert Tibshirani 1986 overcome the overfitting challenge? They add an extra penalty to GAMs in the loss function for each smooth term. They also apply regularization techniques such as LASSO, Ridge, or Elastic Net. The case for GAMs includes:
- Easy to interpret.
- More flexible in fitting the data, and
- Regularizing the predictor functions to avoid overfitting.
It is worth mentioning that the GAM is also used in Facebook’s open-source “Prophet” module. See “Business Forecasting with Facebook’s Prophet”.
Add the Interaction Terms to GAM for Better Prediction Accuracy
However, the predictability of GAMs is generally lower than more complex models that permit interactions. In the paper “Accurate Intelligible Models with Pairwise Interactions” by Lou et. al. (KDD-2013), they add interaction terms to the standard GAMs and call it GA2M — Generalized Additive Models plus Interactions. As a result, GA2M increases the prediction accuracy greatly but still preserves its nice interpretability.

The Explainability Boosting Machine (EBM)
The pairwise interaction terms in GA2M increase accuracy. However, it comes with another problem — it is extremely time-consuming and CPU-hungry. The Microsoft team proposed a solution called the Explainability Boosting Machine (EBM). The work is engineering. First, it trains each smooth function f() using machine learning techniques such as bagging and gradient boosting (that’s the name Boosting in EBM). Second, each feature is tested against all other features like a round-robin tournament. In a round-robin competition, each contestant meets every other contestant. In this way, the model finds the best feature function f() for each feature and shows how each feature contributes to the model’s prediction. Finally, EBM develops the GA2M algorithm in C++ and Python and takes advantage of joblib to provide multi-core and multi-machine parallelization.
InterpretML — A One-Stop Shop
I call it a one-stop shop because it has incorporated the key modeling tasks in a pipeline. These tasks include data exploration, model training, model performance comparison, and prediction interpretability at both the global and local levels. In the following code example, I am going to perform the tasks in (A) to (F):
- (A) Explore the Data
- (B) Train the Explainable Boosting Machine (EBM)
- (C) Performance: How Does the EBM Model Perform?
- (D) Global Interpretability — What the Model Says for All Data
- (E) Local Interpretability — What the Model Says for Individual Data
- (F) Dashboard: Put All in a Dashboard — This is the Best
First, do pip install -U interpret to install the module.
I will use the same red wine quality data so you can compare SHAP, LIME, and InterpretML, as I have been doing in “Explain Your Model with the SHAP Values”, “Explain Any Models with the SHAP Values — Use the KernelExplainer” or “Explain Your Model with LIME”. The target value of this dataset is the quality rating from low to high (0–10). The input variables are the content of each wine sample including fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol. There are 1,599 wine samples. The code is at the end of the article, or via this Github.




















