Summary

The provided content discusses the implementation of Multilayer Perceptron (MLP) deep learning models using Scikit-learn, illustrating the process with examples for both regression and classification tasks.

Abstract

The article delves into the capabilities of Scikit-learn for deep learning, specifically focusing on the Multilayer Perceptron (MLP) model. It highlights that despite being known for traditional machine learning tasks, Scikit-learn can also be used for basic deep learning applications. The author explains the limitations of Scikit-learn's MLP, such as the absence of an activation function in the output layer, lack of GPU support, and restricted customization options for parameters and layers. Through a regression example using the California housing dataset and a classification example with the Iris dataset, the article demonstrates the step-by-step process of data preparation, model creation, training, and evaluation. The regression example employs three hidden layers with RELU activation, while the classification example uses four hidden layers with varying numbers of neurons. The author concludes that while Scikit-learn's MLP is suitable for simple deep learning tasks, it may not be ideal for large-scale real-world applications due to its limitations.

Opinions

The author suggests that Scikit-learn's MLP functionalities are often overlooked in favor of more popular deep learning libraries like TensorFlow and Keras.
Scikit-learn's MLP is considered limited for real-life applications due to the absence of GPU support and restricted parameter tweaking capabilities.
The default optimizer for Scikit-learn's MLP is noted to be "Adam," which is deemed capable of handling large datasets effectively.
The article implies that feature scaling is crucial for training MLP models in Scikit-learn, emphasizing the use of StandardScaler.
The author expresses that understanding the errors made by the classification model is essential, advocating for the use of a confusion matrix to assess model accuracy beyond simple metrics like score or accuracy.

Deep Neural Multilayer Perceptron (MLP) with Scikit-learn

MLP is a type of artificial neural network (ANN). Simplest MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.

In the world of deep learning, TensorFlow, Keras, Microsoft Cognitive Toolkit (CNTK), and PyTorch are very popular. Most of us may not realise that the very popular machine learning library Scikit-learn is also capable of a basic deep learning modelling. In this article, I will discuss the realms of deep learning modelling feasibility in Scikit-learn and limitations. Further, I will discuss hands-on implementation with two examples.

Salient points of Multilayer Perceptron (MLP) in Scikit-learn

There is no activation function in the output layer.
For regression scenarios, the square error is the loss function, and cross-entropy is the loss function for the classification
It can work with single as well as multiple target values regression.
Unlike other popular packages, likes Keras the implementation of MLP in Scikit doesn’t support GPU.
We cannot fine-tune the parameters like different activation functions, weight initializers etc. for each layer.

Regression Example

Step 1: In the Scikit-Learn package, MLPRegressor is implemented in neural_network module. We will import the other modules like “train_test_split” to split the dataset into training and train set to test the model, “fetch_california_housing” to get the data, and “StandardScaler” to scale the data as different features( independent variables) have wide value range scale. It is very important to scale the data used for training the model.

You can learn more about different scalers in the article Feature Scaling — Effect Of Different Scikit-Learn Scalers: Deep Dive

"""Import the required modules"""

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
import pandas as pd

Step 2: We will split the dataset into train and test dataset. We have reserved 20% of the dataset for checking the accuracy of the trained model. Independent train and test dataset are further scaled to make sure that the input data is standard normally distributed are centred around zero and have variance in the same order.

cal_housing = fetch_california_housing()
X = pd.DataFrame(cal_housing.data,columns=cal_housing.feature_names)
y = cal_housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=1, test_size=0.2)

Step 3: We scale the data just like the above regression example and for the same reason.

sc_X = StandardScaler()
X_trainscaled=sc_X.fit_transform(X_train)
X_testscaled=sc_X.transform(X_test)

Step 4: In the below code, three hidden layers are modelled, with 64 neurons in each layer. Considering the input and output layer, we have a total of 5 layers in the model. In case any optimiser is not mentioned then “Adam” is the default optimiser and it can manage pretty large dataset.

reg = MLPRegressor(hidden_layer_sizes=(64,64,64),activation="relu" ,random_state=1, max_iter=2000).fit(X_trainscaled, y_train)

In addition to “RELU” activation, MLPRegressor supports the “sigmoid” and “hyperbolic tan” function.

Step 5: In the below code, the trained model is used to predict the target values of the reserved test dataset, which model has not seen before.

y_pred=reg.predict(X_testscaled)
print("The Score with ", (r2_score(y_pred, y_test))

Classification Example

We have seen a regression example. Next, we will go through a classification example. In Scikit-learn “ MLPClassifier” is available for Multilayer Perceptron (MLP) classification scenarios.

Step1: Like always first we will import the modules which we will use in the example. We will use the Iris database and MLPClassifierfrom for the classification example.

from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScaler
import pandas as pd
from sklearn.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt

Step 2: In separate data frames “X” and “y”, the values of the independent and dependent features are stored.

iris_data = load_iris()
X = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)
y = iris_data.target

Step 3: Similar to the regression example above we will split the dataset into train and test dataset. We have reserved 20% of the dataset for checking the accuracy of the trained model. Independent train and test dataset are further scaled to make sure that the input data is standard normally distributed are centred around zero and have variance in the same order.

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=1, test_size=0.2)
sc_X = StandardScaler()
X_trainscaled=sc_X.fit_transform(X_train)
X_testscaled=sc_X.transform(X_test)

Step 4: In the below code, we have modelled four hidden layers with different neurons in each layer. Considering the input and output layer, we have a total of 6 layers in the model. In case any optimiser is not mentioned then “Adam” is the default optimiser.

clf = MLPClassifier(hidden_layer_sizes=(256,128,64,32),activation="relu",random_state=1).fit(X_trainscaled, y_train)
y_pred=clf.predict(X_testscaled)
print(clf.score(X_testscaled, y_test))

The classifier shows quite a high score for the test data. It is important to understand the areas in which the classification model is making an error to make a full sense of model accuracy.

You can read more on the reason we should use the confusion matrix to judge the classification model accuracy in “Accuracy Visualisation: Supervised Machine Learning Classification Algorithms”.

Step 5: We will draw a confusion matrix to understand the classifications which are made incorrect by the model.

fig=plot_confusion_matrix(clf, X_testscaled, y_test,display_labels=["Setosa","Versicolor","Virginica"])
fig.figure_.suptitle("Confusion Matrix for Iris Dataset")
plt.show()

It seems only one “Versicolor” species is incorrectly identified as “Virginica” and rest by the model.

Conclusion: We can do simple deep learning regression and classification model in Scikit-learn. In my view, it is not up for any real-life large-scale modelling as there is no GPU support and very limited options to tweak the parameters.

You can learn more about the deep learning visualisation technique in the article Accuracy Visualisation In Deep Learning

And for people starting with Machine Learning journey you will the article Machine Learning For People In Hurry a good starting point.