avatarAmit Chauhan

Summary

The web content provides a comprehensive guide to understanding and implementing linear regression using Python, complete with mathematical explanations, real-life examples, and code snippets.

Abstract

The article titled "Fully Explained Linear Regression with Python" delves into the application of simple linear regression for predictive analysis. It explains the concept of regression as a method for modeling the relationship between an independent variable and a dependent variable, emphasizing the importance of this approach in supervised machine learning. The author illustrates the process of finding the best-fit line using the equation y = mx + b, where y is the dependent variable, x is the independent variable, m is the slope, and b is the y-intercept. The article includes formulas for calculating these values and demonstrates their application with a fitness dataset that predicts kilometers covered based on energy expenditure. The author then transitions to a Python implementation using libraries such as NumPy, Pandas, Matplotlib, and Statsmodels to perform data analysis, model fitting, and visualization, ultimately confirming the manually calculated regression values. The conclusion highlights the utility of the Ordinary Least Squares (OLS) method in providing a linear regression approximation and praises the Statsmodels library for its statistical insights. The article also provides recommended further reading on various Python and machine learning topics.

Opinions

  • The author posits that linear regression is a fundamental technique in predictive analysis and making inferences.
  • It is suggested that factor analysis techniques enhance the accuracy of real-time analysis when used in conjunction with regression analysis.
  • The author expresses that the Statsmodels library is excellent for statistical analysis and inferences in Python.
  • The use of real-life examples and visual aids is advocated as a means to better understand and apply linear regression concepts.
  • The article recommends additional resources for readers interested in deepening their knowledge of Python and machine learning methods.

Machine Learning, Programming

Fully Explained Linear Regression with Python

How the regression problem is solved with a real-life example.

Fitting to the best position with minimum error. Image by Author

Simple linear regression is used for predictive analysis and making inferences. In this type, there is one independent and one dependent variable. Whenever there are cause and effect in modeling, we go to regression analysis. The performance is more accurate in real-time analysis when we use factor analysis techniques. The regression analysis fundamentals use in supervised machine learning. Three important things to be noticed here:

  • We need data to get analysis on, for the whole population it is a little bit very tedious task, so we get sample data for analysis.
  • After getting data, we need to design a model so that it works for the whole population.
  • After modeling, we can make predictions for the population.

Earlier, we noticed that it is cause and effect type modeling. So, where we get our linear regression. Linear means increasing the effect as we increase the cause so that both are changing parallel. We need some mathematical to get linear predictions. So we use an equation of a straight line to get the properties of a linear straight line.

The equation of a straight line is

y = mx + b

Here,

Y is a dependent variable ( outcome ) or predicted variable.

X is an independent variable.

M is a slope, or we can say gradient.

B is a value intercept on the y-axis.

The Y is a function of X. Regression model is a linear approximation. For a good prediction, we need to find the B and M.

Example:

Suppose we have fitness data of Energy and kilometers covered.

Image by Author

We need to find the B and M. the Formula to find these values is given below:

M = number of samples * ( XY sum- X sum*Y sum) / Number of samples * (X square sum- X sum squared)

B = Y sum- M * X sum/number of samples

The image shows these values.

Image by Author

After calculating the value, M comes to 1.89, and the value of B comes to 0.667. From these values, we can get a prediction from the formula.

Y = 1.89*X + 0.667

After checking some X values to predict the Kilometers. An example is shown below:

Predicted Kilometers on new X values. Image by Author

We got our model but a simple technique. Let’s check with python if we get the same values or not.

#import all the libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

Read the excel file

df = pd.read_excel("fitness.xlsx")

Use the describe function to see the stats.

Image by Author

Split the data into x and y.

y = df['Y1']
x = df['X1']
#plot the scatter plot between them
plt.scatter(x,y)
plt.xlabel('Energy', fontsize =20)
plt.ylabel('Kms Covered', fontsize =20)
plt.show()

We almost get a linear correlation.

Now fit the OLS model on our data.

x_new = sm.add_constant(x)
output = sm.OLS(y, x_new).fit()
output.summary()

#output:
                coef        
---------------------------------
const          0.667   
x1             1.89

After getting the summary, we get the same values. Here we use the statsmodels, which an excellent library for statistics and inferences.

After the fit, the OLS model lets check the scatter plot with the fitted line.

plt.scatter(x1,y)
y_pred = 1.89*X + 0.667
fig = plt.plot(x1, y_pred, lw = 5, c='red', label='regression line' 
plt.xlabel('Energy', fontsize =20)
plt.ylabel('Kms Covered', fontsize =20)
plt.show()

Here is the scatter plot after the fit of the best line.

Conclusion:

The OLS gives a simple linear regression approximation, and statsmodel gives a wonderful insight into the data's statistics.

I hope you like the article. Reach me on my LinkedIn and twitter.

Recommended Articles

1. 15 Most Usable NumPy Methods with Python 2. NumPy: Linear Algebra on Images 3. Exception Handling Concepts in Python 4. Pandas: Dealing with Categorical Data 5. Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning 6. Fully Explained Linear Regression with Python 7. Fully Explained Logistic Regression with Python 8. Data Distribution using Numpy with Python 9. 40 Most Insanely Usable Methods in Python 10. 20 Most Usable Pandas Shortcut Methods in Python

Programming
Machine Learning
Data Science
Analytics
Data Visualization
Recommended from ReadMedium