avatarCassie Kozyrkov

Summary

The provided web content discusses the concepts of overfitting, underfitting, and regularization in machine learning, emphasizing the bias-variance tradeoff and the importance of model complexity relative to data quality.

Abstract

The article "Making Friends with AI: Overfitting, Underfitting, and Regularization" delves into the bias-variance tradeoff, a fundamental concept in machine learning. It explains that all perfect models have similar performance, but each imperfect model can fail in its own unique way. The article suggests that to improve a model beyond a certain point, one needs better quality or quantity of data. It highlights the issue of overfitting, where a model learns the noise in the training data, leading to poor generalization, and underfitting, where a model is too simple to capture the underlying trend in the data. The author introduces regularization as a technique to prevent overfitting by penalizing model complexity, ensuring a balance between model performance and simplicity. The article also teases the next part of the series and provides resources for further learning, including a YouTube course and hands-on tutorials.

Opinions

  • The author paraphrases Leo Tolstoy's "Anna Karenina" to illustrate that while all successful models share similar characteristics, each unsuccessful model is unsuccessful in its own way.
  • The article suggests that a model's performance is constrained by the quality and quantity of available data, and beyond that, the remaining error is unavoidable.
  • Overfitting is seen as a result of an optimization algorithm's attempt to reduce training mean squared error (MSE) to zero by capturing noise, which is considered a form of "cheating."
  • Regularization is portrayed as a necessary intervention by the "boss" (implying the data scientist or engineer) to discourage unnecessary complexity and encourage models that generalize well to unseen data.
  • The author humorously suggests that engineers' tendency to overcomplicate solutions could benefit from regularization, implying a critique of needless complexity in model building.
  • Underfitting is described as the result of over-penalizing complexity, leading to models that are too simple and biased, which is likened to insisting on simplicity beyond what the problem requires.
  • The article promotes a balanced approach to model building, advocating for a middle ground between overfitting and underfitting to achieve the best predictive performance.

Making Friends with AI

Overfitting, Underfitting, and Regularization

The bias-variance tradeoff, part 2 of 3

In Part 1, we covered much of the basic terminology as well as a few key insights about the bias-variance formula (MSE = Bias² + Variance), including this paraphrase from Anna Karenina:

All perfect models are alike, but each unhappy model can be unhappy in its own way.

To make the most of this article, I suggest taking a look at Part 1 to make sure you’re well-situated to absorb this one.

Under vs over… fitting. Image by the author.

What does overfitting/underfitting have to do with it?

Let’s say you have a model that is as good as you’re going to get for the information you have.

To have an even better model, you need better data. In other words, more data (quantity) or more relevant data (quality).

When I say as good as you’re going to get, I mean in “good” terms of MSE performance on data your model hasn’t seen before. (It’s supposed to predict, not postdict.) You’ve done a perfect job of getting what you can from the information you have — the rest is error you can’t do anything about with your information.

Reality = Best Model + Unavoidable Error

But here’s the problem… we’ve jumped ahead; you don’t have this model yet.

All you have is a pile of old data to learn this model from. Eventually, if you’re smart, you’ll validate this model on data it hasn’t seen before, but first you have to learn the model by finding useful patterns in data and trying to inch closer and closer to the stated objective: an MSE that’s as low as possible.

Unfortunately, during the learning process, you don’t get to observe the MSE you’re after (the one that comes from reality). You only get to compute a shoddy version from your current training dataset.

Photo by Jason Leung on Unsplash

Oh, and also, in this example “you” are not a human, you’re an optimization algorithm that was told by your human boss to twiddle the dials in the model’s settings until the MSE is as low as it will go.

You say, “Sweet! I can do this!! Boss, if you give me an extremely flexible model with lots of settings to fiddle (neural networks, anyone?), I can give you a perfect training MSE. No bias and no variance.”

The way to get a better training MSE than the true model’s test MSE is to fit all the noise (errors you have no predictively-useful information about) along with the signal. How do you achieve this little miracle? By making the model more complicated. Connecting the dots, essentially.

This is called overfitting. Such a model has an excellent training MSE but a whopper of a variance when you try to use it for anything practical. That’s what you get for trying to cheat by creating a solution with more complexity than your information supports.

The boss is too smart for your tricks. Knowing that a flexible, complicated model allows you to score too well on your training set, the boss changes the scoring function to penalize complexity. This is called regularization. (Frankly, I wish we had more regularization of engineers’ antics, to stop them from doing complicated things for complexity’s sake.)

Regularization essentially says, “Each extra bit of complexity is going to cost you, so don’t do it unless it improves the fit by at least this amount…”

If the boss regularizes too much — getting tyrannical about simplicity — your performance review is going to go terribly unless you oversimplify the model, so that’s what you end up doing.

This is called underfitting. Such a model has an excellent training score (mostly because of all the simplicity bonuses it won) but a whopper of a bias in reality. That’s what you get for insisting that solutions should be simpler than your problem requires.

And with that, we’re ready for Part 3, where we bring it all together and cram the bias-variance tradeoff into a convenient nutshell for you.

Thanks for reading! How about a YouTube course?

If you had fun here and you’re looking for an entire applied AI course designed to be fun for beginners and experts alike, here’s the one I made for your amusement:

Looking for hands-on ML/AI tutorials?

Here are some of my favorite 10 minute walkthroughs:

Machine Learning
Data Science
Statistics
Overfitting
Bias Variance Tradeoff
Recommended from ReadMedium