Don’t Make This Mistake with Scaling Data
MinMaxScaler can return values smaller than 0 and greater than 1.
MinMaxScaler is one of the most commonly used scaling techniques in Machine Learning (right after StandardScaler).
Transform features by scaling each feature to a given range.
This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.
Usually, when we use MinMaxScaler, we scale values between 0 and 1.
Did you know that MinMaxScaler can return values smaller than 0 and greater than 1? I didn’t know this and it surprised me.
The problem
Let’s look at an example. I initialize scaler with two features.
import numpy as np
from sklearn.preprocessing import MinMaxScalerdata = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
scaler.fit(data)Now, let’s check minimum and maximum values for these two features:
scaler.data_min
# [-1. 2.]scaler.data_max_
# [ 1. 18.]Those are estimated like expected.
Now, let's try to input values greater than the max:
scaler.transform(np.array([[2, 20]]))# array([[1.5 , 1.125]])Scalar returns a value greater than 1.
Or lower min:
scaler.transform(np.array([[-2, 1]]))# array([[-0.5 , -0.0625]])Scalar returns a value lower than 0.
No big deal, right?
The problem can occur when we train a linear classifier, which multiplies scaled features with coefficients.
The classifier hasn’t seen a negative value for a certain feature yet and it can invert the coefficient which makes the classifier work incorrectly.
The solution
I suggest you cap the outputs of MinMaxScaler between 0 and 1.
scaler.transform(np.array([[-2, 1]]))# array([[0., 0.]])Let’s connect
Talk: Book a call Socials: YouTube 🎥 | LinkedIn | Twitter Code: GitHub






