avatarRukshan Pramoditha

Summary

The web content discusses regularization techniques in regression models, specifically Ridge, Lasso, and Elastic Net regression, and their implementation in Scikit-learn for mitigating overfitting and improving model generalization.

Abstract

The article "The Choice of Regularization: Ridge, Lasso and Elastic Net Regression" delves into the application of regularization methods to linear regression models to prevent overfitting. It explains the concepts of L1, L2, and Elastic Net regularization, detailing how each method adds a penalty term to the loss function to control the complexity of the model. The Scikit-learn library offers specific classes for Ridge, Lasso, and Elastic Net regression, each with hyperparameters that can be tuned to balance model performance between overfitting and underfitting. The author emphasizes the importance of selecting an appropriate regularization technique and tuning the hyperparameters to achieve a model that performs well on both training and unseen test data. The article also suggests that regularization is not always necessary and should be applied when overfitting is observed, as indicated by higher training RMSE compared to test RMSE.

Opinions

  • The author conveys that regularization is a critical technique for improving the generalization of regression models.
  • It is implied that choosing the right type of regularization (Ridge, Lasso, or Elastic Net) depends on the specific problem and data characteristics.
  • The author suggests that using hyperparameter tuning techniques is essential to find the optimal regularization strength (alpha) and the right mix of L1 and L2 regularization (l1_ratio) in Elastic Net.
  • The article advises against setting the regularization strength (alpha) to zero, as it would render the regularization methods ineffective and equivalent to standard linear regression.
  • There is an opinion that a good regression model should neither overfit nor underfit, and regularization helps in achieving this balance.
  • The author promotes their work and Medium membership, indicating a belief in the value of their insights and writing for the data science community.

The Choice of Regularization: Ridge, Lasso and Elastic Net Regression

Applying L1, L2 or both L1 and L2 regularization to linear regression

Photo by Andre Hunter on Unsplash

Probably, you may have heard terms like “Ridge”, “Lasso” and “Elastic Net”. These are just technical terms. The underlying concept behind those is regularization. We’ll clarify this soon in this post.

Previously, we’ve discussed regularization from another angle: Mitigate Overfitting with Regularization. The main benefit of regularization is to mitigate overfitting. Regularized models are able to generalize well on the unseen data.

Basically, regularization is the process of limiting (controlling) the learning process of a model by adding another term to the loss (cost) function that we’re trying to minimize.

(Image by author)

The regularization term (also called the penalty term) can take different forms that will be discussed soon in this post.

A linear regression model that predicts continuous-valued outputs learns the optimal values for its coefficients by minimizing its loss function. The same thing applies to a logistic regression model that predicts discrete-valued outputs. In both cases, we can apply regularization during the model training phase.

When we consider the Scikit-learn LogisticRegression() class for logistic regression models, there is a hyperparameter called penalty to choose the type of regularization.

LogisticRegression(penalty='...')

There are 4 options to select for the penalty (type of regularization).

  • ‘none’ — No regularization applied
  • ‘l1’ — L1 regularization applied
  • ‘l2’ — L2 regularization applied (default choice)
  • ‘elasticnet’ — Both L1 and L2 regularization applied

However, when we consider the LinearRegression() class for linear regression models, there is no specific hyperparameter to choose the type of regularization. Instead, we should use 3 separate classes for each type of regularization.

  • When we apply the L2 regularization to the cost function of linear regression, it is called Ridge regression.
  • When we apply the L1 regularization to the cost function of linear regression, it is called Lasso regression.
  • When we apply both L1 and L2 regularization to the cost function of linear regression at the same time, it is called Elastic Net regression.

All the above regression types fall under the category of regularized regression.

Let’s discuss each type in detail.

Ridge Regression

Here, we apply the L2 regularization term (defined below) to the cost function of linear regression:

L2 = α.Σ(squared values of coefficients)

The Scikit-learn class for Ridge regression is:

Ridge(alpha=...)

The alpha is a hyperparameter that controls the regularization strength. It must be a positive float. The default value is 1. Larger values of alpha imply stronger regularization (less-overfitting, may be underfitting!). Smaller values imply weak regularization (overfitting). We want to build a model that neither overfits nor underfit the data. So, we need to choose an optimal value for alpha. For that, we can use a hyperparameter tuning technique.

Note: Ridge(alpha=0) is equivalent to the normal linear regression solved by the LinearRegression() class. It is not advised to use alpha=0 with Ridge regression. Instead, you should use normal linear regression.

Lasso Regression

Here, we apply the L1 regularization term (defined below) to the cost function of linear regression:

L1 = α.Σ(absolute values of coefficients)

The Scikit-learn class for Lasso regression is:

Lasso(alpha=...)

This alpha and its definition are the same as the alpha defined in the L2 term. The default value is 1.

Note: Lasso(alpha=0) is equivalent to the normal linear regression solved by the LinearRegression() class. It is not advised to use alpha=0 with Lasso regression. Instead, you should use normal linear regression.

Elastic Net Regression

Here, we apply both L1 and L2 regularization terms to the cost function of linear regression at the same time.

The Scikit-learn class for Elastic Net regression is:

ElasticNet(alpha=..., l1_ratio=...)

The hyperparameter l1_ratio defines how we mix both L1 and L2 regularization. Therefore, it is called the ElasticNet mixing parameter. The acceptable range of values for l1_ratio is:

0 <= l1_ratio <= 1

Here are the possible cases:

  • l1_ratio = 0 means there is no L1 term and there is only L2 regularization.
  • l1_ratio = 1 means there is no L2 term and there is only L1 regularization.
  • 0 < l1_ratio < 1 means the regulation is defined as a combination of L1 and L2 terms. If l1_ratio is close to 1, it means that the L1 term is dominating. If l1_ratio is close to 0, it means that the L2 term is dominating.

So, that’s the idea behind the terms “Ridge”, “Lasso” and “Elastic Net”!

Summary

It is not necessary to always apply regularization to linear regression models. First, you can try with LogisticRegression() class and then see the output. If you get a lower value for the test RMSE and a higher value for the train RMSE, your regression model is overfitting. Then, you can try applying each type of regularization and see the outputs. You can also try different valid values for the hyperparameters alpha and l1_ratio. In the end, you’ll have many models. You can choose a good model by looking at the RMSE on both train and test sets. Please note that a good model neither overfits nor underfit the data. It should be able to perform well on training data and also generalize well on the unseen data (test data).

Note: In addition to applying regularization, there are other ways to address the problem of overfitting. You can learn them by reading the following series of articles writing by me.

A list of articles for “Addressing Overfitting” (Screenshot by author)

This is the end of today’s post. My readers can sign up for a membership through the following link to get full access to every story I write and I will receive a portion of your membership fee.

Thank you so much for your continuous support! See you in the next story. Happy learning to everyone!

Special credit goes to Andre Hunter on Unsplash, who provides me with a nice cover image for this post.

Rukshan Pramoditha 2021–10–12

Regression
Ridge Regression
Lasso Regression
Elastic Net
Machine Learning
Recommended from ReadMedium