Is Logistic Regression A Regressor or A Classifier? Let’s End the Debate

From two different perspectives and with 3 rounds

Even though we can find many posts and articles talking about this question, I decided to add one more. Because it is an opportunity to discuss some underlying theories and frameworks.

Some people try to defend their viewpoints by giving a firm answer: logistic regression is a regressor, or logistic regression is a classifier. I will not defend a viewpoint, because the debate is kind of absurd to me.

For those defending that logistic regression is a classifier, please, you know that in the name “logistic regression”, there is “regression”, so for the ones who name this model this way, there is a reason why they consider it a regressor. Yes, I know, there can be misnomers, but still, you have to admit that there is a reason.
For those defending that logistic regression is only a regressor, you also know that it is applied to classification tasks. For this reason, other people want to call it a classifier.

So the answer is that both viewpoints are possible. Just like many things in life, we give different, sometimes contradictory statements for the seeming same subject, that is because we see it from different angles. So what is interesting is not the answer, but how we come to this answer.

This question is also analogous to the age-old debate over whether a tomato is a fruit or a vegetable. They both have two valid answers, as they originate from different perspectives and fields of study, which are botany and nutrition. Acknowledging both answers will help you to better understand the structure of plants and improve the nutritional analysis of your meals. Same for logistic regression, in this article, we will try to understand why the two possible answers exist, and the reasons behind them will help you better understand the underlying structure of this model.

0. The two perspectives

In short, there are two perspectives:

Statisticians’ perspective: the name “logistic regression” itself was given by statisticians, so by definition, for them, logistic regression is a regressor because the output of the model is a probability (between 0 and 1) and it is continuous.
Machine learning practitioners’ perspective: there are two types of supervised machine learning, regression or classification, so for them, logistic regression is a classifier because well it is used for a classification task.

The truth is that during our learning process, we read various sources of information, some follow the statistical perspective, and others the machine learning perspective. And sometimes, the frontier of these two points of view is blurred.

Now, some will say, in general, machine learning can be considered a branch of statistics, at least, they are not totally different, and I should not oppose them. Yes, I agree, however, I will explain the two perspectives by clearly distinguishing them, and you will see the differences. Then we can better understand the subject that is studied. I am planning to write an article to discuss the two perspectives. In particular, logistic regression serves as a prime example in this article.

1. Round 1: what the model really is

First, what are we talking about? What is logistic regression? We will be surprised that, well we are talking about the different models from two perspectives.

1.1 Logistic regression model according to statisticians

For statisticians, the model is

p = 1 / (1 + exp (- (wX + b) ) )

and the output of the model is a value from 0 to 1, and it can then represent a probability. Now, if you want, you can do a classification by cutting the probability into two. Usually, we use the middle value: 0.5 or 50%. So the order here is, we first have probabilities as output, then classification.

That is why, for statisticians, logistic regression is a regressor, between the output is continuous.

1.2 Logistic regression model according to machine learning perspective

For machine learning practitioners, the mode is

y = wX + b

and the output is y, called the decision function. Usually, the value of y is either -1 or 1.

Then we have to fit the model, and a loss function has to be chosen. Now, when the loss function log loss is used, there is equivalence with the logistic regression from the statisticians’ perspective.

Finally, the linear model is used for the classification task depending on whether y is positive or negative.

So to some extent, they should not use this name of logistic regression, since the logistic function is never used for the prediction of the class. But, since minimizing the log loss is equivalent to maximizing the likelihood, and the same result is obtained about the classification, we say that logistic regression has been applied.

We could imagine another name such as “log loss classifier” given to this particular combination of the linear model fitted on log loss. But it turns out that machine learning researchers usually don’t give specific names, but when there is an equivalent from an existing statistical model, then it is pointed out.

Now, if you want to calculate the probability, you can apply, the logistic function! So, the order is different, here we first have classification with the decision function, then if needed, the probability can be computed.

(Some of you may be confused about what I say from this machine learning perspective. It is will be hopefully more clear with round 3. For the time being, just think of SVM. Just like logistic regression, SVM is a classifier. And contrary to logistic regression, the probability calculation is not immediate. Well, you can apply logistic regression, but other methods such as isotonic regression also exist to compute the probability.)

1.3 Conclusion of round 1

To conclude, despite the fact that statisticians and machine learning practitioners use the same name “logistic regression”, the model is actually different, and so is the output of the model. Statisticians consider logistic regression a regressor because the model tries to build a continuous outcome which is a probability. And machine learning practitioners consider logistic regression a classifier because the output is binary… Oh wait, but the output of the linear model is continuous. The classification part is one more step that they add when the model is fitted. Is the classification step really a part of the model itself? Let’s move to round 2.

2. Round 2: what a (linear) classifier really is

I remember that in high school, for a philosophy dissertation, we should always begin by defining the terms we use. So, what is a classifier?

2.1 Linear classifier according to machine learning perspective

For machine learning practitioners, logistic regression is a classifier because it is applied to a dataset with a binary outcome. So, the definition of a classifier is a model that is applied to a dataset with a binary target variable.

With this definition in mind, let’s see, what if we use squared error instead of log loss in the linear model?

It is of course possible because we can always calculate the squared error between two numbers, even if the target variable only has two values.

For those who have any doubts, you can read the official documentation of SGDClassifier:

‘squared_error’, ‘huber’, ‘epsilon_insensitive’ and ‘squared_epsilon_insensitive’ are designed for regression but can be useful in classification as well

So according to this definition of a classifier, linear regression is also a classifier!

2.2 Classifier according to statistical perspective

Now, for statisticians, what is a classifier? More specifically, can you find a mathematical function that directly has a binary output? Well, it seems that usual mathematical functions always have a numerical output. And an output between 0 and 1 is the closest approximation to a binary output. So if logistic regression is still considered a regressor, then a mathematical function-based classifier may not even exist!

In order to clarify my statement, I am talking about the mathematical functions-based models or parametric models. For distance-based models such as KNN, or decision tree-based models, there is no debate to distinguish a regressor from a classifier.

Some would mention LDA (Linear Discriminant Analysis), but can you write the mathematical function of LDA? You cannot because the starting point of this model is not a mathematical function-based model. But what is surprising is that the form of the final model for classification can be very close to logistic regression! You can read this article about the comparison of LDA vs. Logistic Regression.

Others would mention SVM (Support Vector Machine). Well, from the statistical perspective, I am not sure where the place of SVM really is. In the book Introduction to Statistical Learning, there is a specific chapter about SVM as if it is a very different modeling approach compared to others. And what is sure is that the classification is based on the rule over y: if y is positive, then the class is positive, otherwise, the negative class will be assigned. So from this viewpoint, we can apply the same reasoning for logistic regression: it is not a classifier, since the primary output of the model is not binary, but continuous, only after, a decision rule over y can be applied. However, it is commonly admitted that SVM is a classifier.

The truth is that we usually don’t even mention that y is modeled, and jump directly to the hyperplane part, by considering y = 0, as we can see from the Wikipedia page.

https://en.wikipedia.org/wiki/Support_vector_machine

2.3 Conclusion of round 2

To conclude for round 2, when we try to define a classifier, the debate is already closed. For machine learning practitioners, logistic regression is a classifier not because of any characteristic of the model itself, but because it is only based on the fact that the training dataset has a binary target variable. For statisticians, well, it is a regressor because there are no other choices when we try to find a mathematical function to model the binary output… Oh wait, when the output is from a real number, from negative infinity to positive infinity, it is a regression. When the output is from 0 to 1, it is also regression, maybe for statisticians, there are different types of regression. Let’s go to round 3.

3. Round 3: two different frameworks

It is time to take a step back, and gain some altitude to have a global view: where is the place of logistic regression among other models?

3.1 GLM from a statistical perspective

For statisticians, logistic regression belongs to a family of models called the Generalized Linear Model (GLM). The base model is the linear regression with the function y = wX + b, and the term “generalized” means that various link functions can be applied to this simple linear model so that the resulting models can predict different types of output such as continuous, binary with probabilities as output, multinomial with probabilities as output, count data, or only positive value.

Then the estimation of the parameters of the models is done through the Maximum Likelihood Estimation (MLE). So according to the nature of the output, the distribution used is of different kinds with different support such as normal distribution with real number support, gamma distribution with only positive value support, Poisson distribution with integer support, and Bernoulli distribution with binary support (0 and 1).

To study the quality of the model, we usually study the statistical significance of the coefficients and the model.

3.2 Machine learning framework

For machine learning, the usual distinction is only regressor vs. classifier. And it is worth noting that if the target variable represents count data, only positive values, it is still regression. The classification can be binary or multiclass.

For the model fitting part, we usually use the term “cost function” and its value has to be minimized contrary to the “likelihood” that has to be maximized. But in many cases, there is equivalence. And log loss is one possible loss function among others. It turns out that the use of log loss makes the model equivalent to logistic regression.

(Now, we can also mention that hinge loss, combined with L2 penalty term is equivalent to SVM. As I previously mentioned, I don’t know where the place of SVM is from the statistical perspective, but for machine learning, its place is quite clear: it is a linear model with hinge loss and L2 penalty. When statisticians say that it creates a hyperplane that does the classification, the truth is that the linear model with log loss, aka logistic regression, does the same, and in fact, it is true regardless of the loss function used, because it is the definition of a linear classifier: creating a hyperplane from the linear model!)

To assess the quality of the model, we usually use performance metrics and the model tuning part consisting of optimizing hyperparameters. Here we can notice that GLM has not the notion of hyperparameters.

You can learn more about the unified linear classifier in the estimator SGDClassifer.

3.3 Conclusion of round 3

So according to these two frameworks, the question “is logistic regression a regressor or a classifier” is kind of wired:

“logistic regression” is a term used in the framework GLM
regressor vs. classifier is the Machine learning framework

The nature of this question is similar to “is a tomato a fruit or a vegetable”. Both answers are possible because the two viewpoints are from two different fields: botany and nutrition.

To conclude this round, the statistical perspective tries to find models that by definition can be relevant according to the nature of the output, and in the case of logistic regression, the primary goal is to model a binary output with a Bernoulli distribution. And the unified framework is GLM. From the machine learning perspective, we have the main three steps:

model: in the case of logistic regression, it is y = w X + b
fitting: choosing the log loss to find the coefficients w and b
tuning: we usually use a penalty term to avoid overfitting, to be tuned.

4. Conclusion

Is logistic regression a regressor and a classifier? Now, hopefully, we can all see that the question itself is ill-posed. The name “logistic regression” itself reflects a viewpoint. the outputs are not clearly defined. The mathematical function-based classifier is not clearly defined.

But it is an opportunity because when trying to really answer the question, we can rediscover many things we thought we already know through different perspectives and then gain more understanding. The distinction between regressor vs. classifier itself is a machine learning perspective. For statisticians, logistic regression is a model that belongs to GLM (Generalized Linear Model) and this framework considers more types of output such as count data, only positive output.

In order to gain a better overview of machine learning models, especially some models that I mentioned such as LDA or SVM. What are their places in these frameworks? Just like the age-old debate over whether a tomato is a fruit or a vegetable. You maby forget that the same question applies to pumpkins and green beans. To some extent, LDA and SVM can also be considered both regressors and classifiers.

I wrote this article so that you can read about the three steps of machine models to gain a better overall understanding.

I write about machine learning and data science and I explain complex concepts in a clear way. Please follow me with the link below and get full access to my articles: https://medium.com/@angela.shi/membership

References:

Logistic Regression: Bernoulli vs. Binomial Response Variables

Other references: