avatarSuraj Ghimire, MSc

Summary

The provided web content discusses the differences between non-parametric and parametric regression in statistical modeling, emphasizing the flexibility and data-driven nature of non-parametric approaches.

Abstract

The article introduces the concept of supervised learning, where the goal is to predict outputs based on inputs (predictors or independent variables) using regression functions. It distinguishes between parametric regression, which assumes a predefined model form, and non-parametric regression, which does not make strong assumptions about the model's shape. Non-parametric models are presented as more flexible, allowing the data to inform the regression curve's form without being constrained by preconceived notions. The author illustrates both approaches with examples and argues that non-parametric models are particularly useful when prior knowledge about the regression curve is limited. The conclusion suggests that while parametric regression is based on historical data, non-parametric regression can be applied at any stage and that both methods can complement each other in testing model accuracy.

Opinions

  • The author believes that relying on a parametric model without verifying the accuracy of assumptions can lead to misleading inferences.
  • It is conveyed that non-parametric regression is advantageous as it makes minimal assumptions and allows the data to "speak for itself."
  • The article suggests that non-parametric regression is well-suited for situations with little to no prior information about the regression curve.
  • The author implies that non-parametric and parametric regression are not mutually exclusive and can be used in conjunction to validate models.

Non-Parametric Regression vs Parametric Regression

An Introduction

Photo by M. B. M. on Unsplash

In any statistical observations, a set of measured variables might be call inputs. These inputs have some influence on one or more other variables, called outputs. The goal of supervised learning is to use the inputs to predict the values of the outputs. In the statistical literature of machine learning, we have

  1. Inputs are often called the predictors, also known as independent variables.
  2. The outputs are called the responses, or classically the dependent variables.
  3. A function that maps or explains the relation between input and output. in supervised learning, they are well known as regression functions. They correspond to the case where the outputs are quantitative.
An example of a statistical model. Figure by author

In the above model, we need to predict the amount of glucose of ith person using the infrared absorption spectrum of his/her blood. This prediction can be done in two different ways, mainly.

  1. If we have a pre-defined model, we can use a parametric approach.
  2. If we do not have any model, we use a non-Parametric approach

A parametric model presumes that the form of a function is known. When an experimenter chooses one family of curves and inputs the choices into the inferential process, the information that data can supply concerning the model development is then restricted from the data under this assumed parametric form. The experimenter relies less on the data for information about the model.

An example of a parametric approach: fig by author

The assumption of the model without the knowledge of the accuracy of his or her assumptions can be dangerous in the sense of producing misleading inferences about the regression curve.

If we infer the above-given equation with unknown function while making few assumptions as possible, basically as a non-parametric model, it can be as

a non-parametric approach

Why the non-parametric model?

  1. It allows great flexibility in the possible form of the regression curve and makes no assumption about a parametric form.
  2. It only assumes that the regression curve belongs to them some infinite-dimensional collection of functions.
  3. It heavily relies on the experimenter to supply only qualitative information about the function and let the data speak for itself concerning the actual form of a regression curve.
  4. It is best suited for inference in a situation where there is very little or no prior information about the regression curve.

Conclusion

The parametric form of regression is used based on historical data; non-parametric can be used at any stage as it doesn’t take any presumption. However, parametric and non-parametric regression is not a contradiction of each other. While the subsequent computation of non-parametric form, based on the first result can be used as a parametric form, any type of parametric regression can be used as non-parametric to test the accuracy of the model that we are using for parametric regression.

Further reading for non-parametric regression.

[1] Larry Wasserman, All of Nonparametric Statistics, Springer Texts in Statistics

[2] Randall L. Eubank, Nonparametric Regression and Spline Smoothing (1999), MARCEL DEKKER, INC.

You might also like this story.

If you enjoyed reading this story, I would like to send you my weekly newsletter and updates. Click here to subscribe. You can also reach me through LinkedIn

Machine Learning
Regression
Technology
Data Science
Statistics
Recommended from ReadMedium