avatarMandar Karhade, MD. PhD.

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4649

Abstract

terations): y_pred = X.dot(theta) error = y_pred - y gradient = X.T.dot(error) / <span class="hljs-built_in">len</span>(X) theta -= learning_rate * gradient

<span class="hljs-comment"># Predict the target variable</span> y_pred = X.dot(theta)</pre></div><h1 id="6420">Normal equation</h1><p id="9ede">The normal equation is a closed-form solution for linear regression that can be used to estimate the parameters of a linear regression model. Here’s an example of how to use it:</p><div id="8baa"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Create some data</span> X = np.array([[<span class="hljs-number">10</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">20</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">50</span>]]) y = np.array([<span class="hljs-number">3</span>, <span class="hljs-number">5</span>, <span class="hljs-number">7</span>, <span class="hljs-number">9</span>])

<span class="hljs-comment"># Add a constant to the independent variables</span> X = np.c_[np.ones(X.shape[<span class="hljs-number">0</span>]), X]

<span class="hljs-comment"># Compute the normal equation</span> theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

<span class="hljs-comment"># Predict the target variable</span> y_pred = X.dot(theta)</pre></div><h1 id="f844">Bayesian linear regression</h1><p id="d1cd">Bayesian linear regression is a probabilistic approach to linear regression that allows for uncertainty in the estimates of the parameters. Here’s an example of how to use it:</p><p id="5183">Note: Bayesian linear regression using the latest version of <code>pymc3</code> may cause issues related to intel’s <code>mkl</code> library of conflict with <code>theano.</code> Depending on your system, you might have to troubleshoot that.</p><div id="9450"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> pymc3 <span class="hljs-keyword">as</span> pm

<span class="hljs-comment"># Create some data</span> X = np.array([[<span class="hljs-number">10</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">20</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">50</span>]]) y = np.array([<span class="hljs-number">3</span>, <span class="hljs-number">5</span>, <span class="hljs-number">7</span>, <span class="hljs-number">9</span>])

<span class="hljs-comment"># Define the model</span> <span class="hljs-keyword">with</span> pm.Model() <span class="hljs-keyword">as</span> model: <span class="hljs-comment"># Define the priors</span> beta = pm.Normal(<span class="hljs-string">'beta'</span>, mu=<span class="hljs-number">0</span>, sd=<span class="hljs-number">10</span>, shape=X.shape[<span class="hljs-number">1</span>]) sigma = pm.HalfNormal(<span class="hljs-string">'sigma'</span>, sd=<span class="hljs-number">1</span>)

<span class="hljs-comment"># Define the likelihood</span>
y_obs = pm.Normal(<span class="hljs-string">'y_obs'</span>, mu=X.dot(beta), sd=sigma, observed=y)

<span class="hljs-comment"># Fit the model</span>
trace = pm.sample(<span class="hljs-number">1000</span>)

<span class="hljs-comment"># Predict the target variable</span> y_pred = X.dot(trace[<span class="hljs-string">'beta'</span>].mean(axis=<span class="hljs-number">0</span>))</pre></div><h1 id="831e">Ridge regression</h1><p id="f903">Ridge regression is a regularized form of linear regression that can be used to prevent overfitting. Here’s an example of how to use it:</p><div id="2e18"><pre><span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> Ridge <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Create some data</span> X = np.array([[<span class="hljs-number">10</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">20</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">50</span>]]) y = np.array([<span class="hljs-number">3</span>, <span class="hljs-number">5</span>, <span class="hljs-number">7</span>, <span class="hljs-number">9</span>])

<span class="hljs-comment"># Create a r

Options

idge regression object</span> model = Ridge(alpha=<span class="hljs-number">0.1</span>)

<span class="hljs-comment"># Fit the model to the data</span> model.fit(X, y)

<span class="hljs-comment"># Predict the target variable</span> y_pred = model.predict(X)</pre></div><h1 id="3b8b">Which one should I choose?</h1><p id="c079">Great that there are multiple methods to achieve almost the same result, so which one should we choose and why? It might be helpful to look at the comparison of these methods. I will try to lay out some of the biggest pros and cons of each.</p><h2 id="76ba">Scikit-learn</h2><p id="858a">Pros:</p><ul><li>Easy to use and implement</li><li>Provides a wide range of linear regression models and options (check out the <a href="https://scikit-learn.org/stable/supervised_learning.html">https://scikit-learn.org/stable/supervised_learning.html</a> list of linear models)</li><li>Provides built-in methods for cross-validation and model selection</li></ul><p id="2d4b">Cons:</p><ul><li>Assumes linearity and additivity of the relationship between the dependent and independent variables</li><li>It does not provide a probabilistic interpretation of the model. For all the conventional statistics nerds, this might be a deal breaker. Statsmodels for the rescue</li></ul><h2 id="704c">Statsmodels</h2><p id="9dc1">Pros:</p><ul><li>Provides detailed statistical analysis and diagnostic tools</li><li>Supports a wide range of statistical models, including linear regression with categorical variables</li></ul><p id="0013">Cons:</p><ul><li>It can be more complex to use and implement than scikit-learn</li><li>It may not scale well to large datasets (all the big data nerds, yes sampling might be your best friend if you insist on using <code>statmodels</code>)</li></ul><h2 id="28bc">Gradient descent</h2><p id="3be5">Pros:</p><ul><li>Can handle large datasets and high-dimensional feature spaces</li><li>It can be used for online and incremental learning. This is probably the biggest benefit of gradient descent but it comes at a steep cost of computational complexity and in-general slowness)</li></ul><p id="62a0">Cons:</p><ul><li>It may converge slowly or not at all if the learning rate is not set correctly.</li><li>Be careful about choosing the right kernel, gradient step size, and the number of iterations. If you are using the cloud then better to experiment on a smaller set before breaking the bank.</li><li>May converge to a suboptimal solution if the data is not standardized or normalized</li></ul><h2 id="b279">Normal equation</h2><p id="255c">Pros:</p><ul><li>Provides a closed-form solution that can be computed quickly</li><li>It does not require tuning of hyperparameters</li></ul><p id="5bf5">Cons:</p><ul><li>It can be computationally expensive for large datasets if the matrix is not invertible. Also, think about the memory size that the matrix needs to fit in. Be careful about float conversion. Try to use floatpoint 8 or FP16 operations to save some memory.</li><li>Does not handle collinearity between the independent variables well</li></ul><h2 id="44de">Bayesian linear regression</h2><p id="4e9f">Pros:</p><ul><li>Provides a probabilistic interpretation of the model and estimates the uncertainty in the parameters</li><li>Can handle more complex models with non-linear relationships and interactions between variables</li></ul><p id="70a9">Cons:</p><ul><li>It can be computationally expensive and challenging to implement</li><li>It requires specifying prior distributions for the parameters which many times is challenging to say the least. Consult your subject matter experts to estimate what is a reasonable expectation and built parameters based off that.</li></ul><h2 id="973e">Ridge regression</h2><p id="6d7b">Pros:</p><ul><li>Can prevent overfitting and improve the generalization performance of the model</li><li>Can handle collinearity between the independent variables</li></ul><p id="041c">Cons:</p><ul><li>Requires tuning of hyperparameters, such as the regularization strength</li><li>It does not provide a probabilistic interpretation of the model</li></ul><p id="e152">Okay, so that's a wrap.</p><p id="88b6">Don’t forget to follow, like, and share the article. Thank you!</p><p id="815c">🔔 <b><i>clap </i></b>| <b><i>follow | <a href="https://ithinkbot.com/subscribe">Subscribe</a> </i>🔔</b></p><p id="7eb3">Become a member using my link: <a href="https://ithinkbot.com/membership">https://ithinkbot.com/membership</a></p><figure id="d77c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*IcfJoUoqrTiqFoFv.gif"><figcaption></figcaption></figure></article></body>

6 Methods To Conduct Linear Regression In Python

Generalized Linear Models (GLM) is the basis of the majority of regression-based models

Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables. In Python, there are several methods to conduct linear regression. Here are some of the most commonly used methods:

Scikit-learn

Scikit-learn is a popular machine-learning library in Python that provides a simple and efficient way to conduct linear regression. Here’s an example of how to use it:

from sklearn.linear_model import LinearRegression
import numpy as np

# Create some data
X = np.array([[10, 2], [20, 3], [3, 4], [4, 50]])
y = np.array([3, 5, 7, 9])

# Create a linear regression object
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Predict the target variable
y_pred = model.predict(X)

Statsmodels

Statsmodels is another popular library in Python for statistical modeling. Here’s an example of how to use it for linear regression:

import statsmodels.api as sm
import numpy as np

# Create some data
X = np.array([[10, 2], [20, 3], [3, 4], [4, 50]])
y = np.array([3, 5, 7, 9])

# Add a constant to the independent variables
X = sm.add_constant(X)

# Create a linear regression object
model = sm.OLS(y, X)

# Fit the model to the data
results = model.fit()

# Predict the target variable
y_pred = results.predict(X)

Gradient descent

Gradient descent is an optimization algorithm that can be used to estimate the parameters of a linear regression model. Here’s an example of how to use it:

import numpy as np

# Create some data
X = np.array([[10, 2], [20, 3], [3, 4], [4, 50]])
y = np.array([3, 5, 7, 9])

# Add a constant to the independent variables
X = np.c_[np.ones(X.shape[0]), X]

# Define the learning rate and number of iterations
learning_rate = 0.01
n_iterations = 1000

# Initialize the parameter vector
theta = np.random.randn(X.shape[1])

# Perform gradient descent
for i in range(n_iterations):
    y_pred = X.dot(theta)
    error = y_pred - y
    gradient = X.T.dot(error) / len(X)
    theta -= learning_rate * gradient

# Predict the target variable
y_pred = X.dot(theta)

Normal equation

The normal equation is a closed-form solution for linear regression that can be used to estimate the parameters of a linear regression model. Here’s an example of how to use it:

import numpy as np

# Create some data
X = np.array([[10, 2], [20, 3], [3, 4], [4, 50]])
y = np.array([3, 5, 7, 9])

# Add a constant to the independent variables
X = np.c_[np.ones(X.shape[0]), X]

# Compute the normal equation
theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

# Predict the target variable
y_pred = X.dot(theta)

Bayesian linear regression

Bayesian linear regression is a probabilistic approach to linear regression that allows for uncertainty in the estimates of the parameters. Here’s an example of how to use it:

Note: Bayesian linear regression using the latest version of pymc3 may cause issues related to intel’s mkl library of conflict with theano. Depending on your system, you might have to troubleshoot that.

import numpy as np
import pymc3 as pm

# Create some data
X = np.array([[10, 2], [20, 3], [3, 4], [4, 50]])
y = np.array([3, 5, 7, 9])

# Define the model
with pm.Model() as model:
    # Define the priors
    beta = pm.Normal('beta', mu=0, sd=10, shape=X.shape[1])
    sigma = pm.HalfNormal('sigma', sd=1)

    # Define the likelihood
    y_obs = pm.Normal('y_obs', mu=X.dot(beta), sd=sigma, observed=y)

    # Fit the model
    trace = pm.sample(1000)

# Predict the target variable
y_pred = X.dot(trace['beta'].mean(axis=0))

Ridge regression

Ridge regression is a regularized form of linear regression that can be used to prevent overfitting. Here’s an example of how to use it:

from sklearn.linear_model import Ridge
import numpy as np

# Create some data
X = np.array([[10, 2], [20, 3], [3, 4], [4, 50]])
y = np.array([3, 5, 7, 9])

# Create a ridge regression object
model = Ridge(alpha=0.1)

# Fit the model to the data
model.fit(X, y)

# Predict the target variable
y_pred = model.predict(X)

Which one should I choose?

Great that there are multiple methods to achieve almost the same result, so which one should we choose and why? It might be helpful to look at the comparison of these methods. I will try to lay out some of the biggest pros and cons of each.

Scikit-learn

Pros:

Cons:

  • Assumes linearity and additivity of the relationship between the dependent and independent variables
  • It does not provide a probabilistic interpretation of the model. For all the conventional statistics nerds, this might be a deal breaker. Statsmodels for the rescue

Statsmodels

Pros:

  • Provides detailed statistical analysis and diagnostic tools
  • Supports a wide range of statistical models, including linear regression with categorical variables

Cons:

  • It can be more complex to use and implement than scikit-learn
  • It may not scale well to large datasets (all the big data nerds, yes sampling might be your best friend if you insist on using statmodels)

Gradient descent

Pros:

  • Can handle large datasets and high-dimensional feature spaces
  • It can be used for online and incremental learning. This is probably the biggest benefit of gradient descent but it comes at a steep cost of computational complexity and in-general slowness)

Cons:

  • It may converge slowly or not at all if the learning rate is not set correctly.
  • Be careful about choosing the right kernel, gradient step size, and the number of iterations. If you are using the cloud then better to experiment on a smaller set before breaking the bank.
  • May converge to a suboptimal solution if the data is not standardized or normalized

Normal equation

Pros:

  • Provides a closed-form solution that can be computed quickly
  • It does not require tuning of hyperparameters

Cons:

  • It can be computationally expensive for large datasets if the matrix is not invertible. Also, think about the memory size that the matrix needs to fit in. Be careful about float conversion. Try to use floatpoint 8 or FP16 operations to save some memory.
  • Does not handle collinearity between the independent variables well

Bayesian linear regression

Pros:

  • Provides a probabilistic interpretation of the model and estimates the uncertainty in the parameters
  • Can handle more complex models with non-linear relationships and interactions between variables

Cons:

  • It can be computationally expensive and challenging to implement
  • It requires specifying prior distributions for the parameters which many times is challenging to say the least. Consult your subject matter experts to estimate what is a reasonable expectation and built parameters based off that.

Ridge regression

Pros:

  • Can prevent overfitting and improve the generalization performance of the model
  • Can handle collinearity between the independent variables

Cons:

  • Requires tuning of hyperparameters, such as the regularization strength
  • It does not provide a probabilistic interpretation of the model

Okay, so that's a wrap.

Don’t forget to follow, like, and share the article. Thank you!

🔔 clap | follow | Subscribe 🔔

Become a member using my link: https://ithinkbot.com/membership

Statistics
Python
Data Science
Regression
Mathematics
Recommended from ReadMedium