Hands-on Guide to Plotting a Decision Surface for ML in Python

Utilize matplotlib to visualize decision boundaries for classification algorithms in Python

Introduction

Lately, I have been struggling for a while to visualize the generated model of a classification model. I relied only on the classification report and the confusion matrix to weigh the model performance.

However, visualize the results of the classification has its charm and makes more sense of it. So, I built a decision surface, and when I succeeded, I decided to write about it as a learning process and for anyone who might have stuck on the same issue.

Tutorial content

In this tutorial, I will start with the built-in dataset package within the Sklearn library to focus on the implementation steps. After that, I will use a pre-processed data (without missing data or outliers) to plot the decision surface after applying the standard scaler.

Decision Surface
Importing important libraries
Dataset generation
Generating decision surface
Applying for real data

Decision Surface

Classification in machine learning means to train your data to assign labels to the input examples.

Each input feature is defining an axis on a feature space. A plane is characterized by a minimum of two input features, with dots representing input coordinates in the input space. If there were three input variables, the feature space would be a three-dimensional volume.

The ultimate goal of classification is to separate the feature space so that labels are assigned to points in the feature space as correctly as possible.

This method is called a decision surface or decision boundary, and it works as a demonstrative tool for explaining a model on a classification predictive modeling task. We can create a decision surface for each pair of input features if you have more than two input features.

Importing important libraries

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split

Generate dataset

I will use the make_blobs()function within the datasets class from the Sklearn library to generate a custom dataset. Doing so would focus on the implementations rather than cleaning the data. However, the steps are the same and are a typical pattern. Let’s start by defining the dataset variables with 1000 samples and only two features and a standard deviation of 3 for simplicity’s sake.

X, y = datasets.make_blobs(n_samples = 1000, 
                           centers = 2, 
                           n_features = 2, 
                           random_state = 1, 
                           cluster_std = 3)

Once the dataset is generated, hence we can plot a scatter plot to see the variability between variables.

# create scatter plot for samples from each class
for class_value in range(2):

    # get row indexes for samples with this class
    row_ix = np.where(y == class_value)

    # create scatter of these samples
    plt.scatter(X[row_ix, 0], X[row_ix, 1])

# show the plot
plt.show()

Here we looped over the dataset and plotted points between each Xand y colored by a class label. In the next step, we need to build a predictive classification model to predict the class of unseen points. A logistic regression could be used in this case since we have only two categories.

Develop the logistic regression model

regressor = LogisticRegression()

# fit the regressor into X and y
regressor.fit(X, y)

# apply the predict method 
y_pred = regressor.predict(X)

All y_predcould be evaluated using the accuracy_scoreclass from thesklearn library.

accuracy = accuracy_score(y, y_pred)
print('Accuracy: %.3f' % accuracy)

## Accuracy: 0.972

Generating decision surface

matplotlib provides a handy function called contour(), which can insert the colors between points. However, as the documentation suggested, we need to define the grid of points Xof yin the feature space. The beginning point would be to find the maximum value and minimum value of each feature then increase by one to make sure that the whole space is covered.

min1, max1 = X[:, 0].min() - 1, X[:, 0].max() + 1 #1st feature
min2, max2 = X[:, 1].min() - 1, X[:, 1].max() + 1 #2nd feature

Then we can define the scale of the coordinates using arange() function from the numpy library with a0.01 resolution to get the scale range.

x1_scale = np.arange(min1, max1, 0.1)
x2_scale = np.arange(min2, max2, 0.1)

The next step would be converting x1_scale and x2_scale into a grid. The function meshgrid() within the numpy library is what we need.

x_grid, y_grid = np.meshgrid(x1_scale, x2_scale)

The generated x_gridis a 2-D array. To be able to use it, we need to reduce the size to a one dimensional array using the flatten() method from thenumpy library.

# flatten each grid to a vector
x_g, y_g = x_grid.flatten(), y_grid.flatten()
x_g, y_g = x_g.reshape((len(x_g), 1)), y_g.reshape((len(y_g), 1))

Finally, stacking the vectors side-by-side as columns in an input dataset, like the original dataset, but at a much higher resolution.

grid = np.hstack((x_g, y_g))

Now, we can fit into the model to predict values.

# make predictions for the grid
y_pred_2 = model.predict(grid)

#predict the probability
p_pred = model.predict_proba(grid)

# keep just the probabilities for class 0
p_pred = p_pred[:, 0]

# reshaping the results
p_pred.shape
pp_grid = p_pred.reshape(x_grid.shape)

Now, a grid of values and the predicted class label across the feature space has been generated.

Subsequently, we will plot those grids as a contour plot using contourf(). The contourf()function needs separate grids per axis. To achieve that, we can utilize the x_gridand y_gridand reshape the predictions (y_pred)to have the same shape.

# plot the grid of x, y and z values as a surface
surface = plt.contourf(x_grid, y_grid, pp_grid, cmap='Pastel1')
plt.colorbar(surface)

# create scatter plot for samples from each class
for class_value in range(2):
# get row indexes for samples with this class
    row_ix = np.where(y == class_value)

    # create scatter of these samples
    plt.scatter(X[row_ix, 0], X[row_ix, 1], cmap='Pastel1')

# show the plot
plt.show()

Apply to real data

Now it is time to apply the previous steps to real data to connect everything. As I mentioned earlier, this dataset is already cleaned with no missing points. The dataset represents car purchase history for a sample of people according to their age and salary per year.

dataset = pd.read_csv('../input/logistic-reg-visual/Social_Network_Ads.csv')

dataset.head()

The dataset has two features Ageand EstimatedSalaryand one dependent variable purchased as a binary column. Value 0 represents the person with similar age, and salary that didn’t make a car purchase. However, one means that the person did purchase the car. The next step would be to separate the dependent variable from features as X and y

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(
                                               X, y, 
                                               test_size = 0.25,
                                               random_state = 0)

Feature scaling

We need this step because Age and salary is not on the same scale

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Building the Logistic model and fit the training data

classifier = LogisticRegression(random_state = 0)

# fit the classifier into train data
classifier.fit(X_train, y_train)

# predicting the value of y 
y_pred = classifier.predict(X_test)

Plot the decision surface — training results

#1. reverse the standard scaler on the X_train
X_set, y_set = sc.inverse_transform(X_train), y_train

#2. Generate decision surface boundaries
min1, max1 = X_set[:, 0].min() - 10, X_set[:, 0].max() + 10 # for Age
min2, max2 = X_set[:, 1].min() - 1000, X_set[:, 1].max() + 1000 # for salary

#3. Set coordinates scale accuracy
x_scale ,y_scale = np.arange(min1, max1, 0.25), np.arange(min2, max2, 0.25)

#4. Convert into vector 
X1, X2 = np.meshgrid(x_scale, y_scale)

#5. Flatten X1 and X2 and return the output as a numpy array
X_flatten = np.array([X1.ravel(), X2.ravel()])

#6. Transfor the results into it's original form before scaling
X_transformed = sc.transform(X_flatten.T)

#7. Generate the prediction and reshape it to the X to have the same shape
Z_pred = classifier.predict(X_transformed).reshape(X1.shape)

#8. set the plot size
plt.figure(figsize=(20,10))

#9. plot the contour function
plt.contourf(X1, X2, Z_pred,
                     alpha = 0.75, 
                     cmap = ListedColormap(('#386cb0', '#f0027f')))

#10. setting the axes limit
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

#11. plot the points scatter plot ( [salary, age] vs. predicted classification based on training set)

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], 
                X_set[y_set == j, 1], 
                c = ListedColormap(('red', 'green'))(i), 
                label = j)
    
#12. plot labels and adjustments
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

Decision plot for test set

It is exactly the same as the previous code, but instead of using train use test set.

Conclusion

Finally, I hope this boilerplate could help in visualizing the classification model results. I recommend applying the same steps using another classification model, for example, SVM with more than two features. Thanks for reading, I am looking forward to any constructive comments.

References

Sklearn.datasets API
Utilizing pandas to transform data
matplotlib.contour() API
numpy.meshgrid() API
Plot the decision surface of a decision tree on the iris dataset — sklearn example
Full working Kaggle notebook
GitHub repo