avatarCassie Kozyrkov

Summary

The website content discusses the bias-variance tradeoff in machine learning, emphasizing the importance of understanding the mean squared error (MSE) decomposition to select the right model complexity and regularization hyperparameters.

Abstract

The article titled "Is There Always a Tradeoff Between Bias and Variance?" delves into the concept of the bias-variance tradeoff, a fundamental principle in machine learning for model selection. It clarifies misconceptions about the relationship between bias and variance, particularly the idea that reducing one necessarily increases the other. The author explains the mean squared error (MSE) formula, MSE = Bias² + Variance, and its significance in model evaluation. The article aims to demystify the tradeoff by discussing how a model can be improved without compromise, provided one has the right tools and understanding. It also touches upon the limitations of MSE as a loss function and suggests that creating a custom loss function may be more appropriate depending on one's objectives.

Opinions

  • The author believes that the phrase "bias-variance tradeoff" is often misunderstood and should be more accurately described as a practical guide for choosing a model's complexity.
  • They suggest that the need for a tradeoff between bias and variance is not absolute; models can be improved without compromise if one has the right approach, such as obtaining more data or refining techniques.
  • The author has a humorous and slightly sarcastic tone when addressing common practices in data science, such as the widespread use of MSE due to its ease of optimization, despite its potential shortcomings.
  • They imply that the convenience of optimizing parabolas (a reference to the ease of differentiating quadratic functions) is a significant reason for the loyalty to MSE among data scientists.
  • The author acknowledges the limitations of the MSE as a performance metric, suggesting that readers who are not indifferent to the tradeoffs between bias and variance should consider alternative loss functions.
  • They make a literary reference to Tolstoy's "Anna Karenina" to illustrate that while perfect models may all be similar, imperfect models with the same MSE can be flawed in different ways.
  • The article promotes the author's other educational content, including a YouTube course and hands-on machine learning tutorials, indicating a commitment to accessible AI education.

Irreverent Demystifiers

Is There Always a Tradeoff Between Bias and Variance?

The bias-variance tradeoff, part 1 of 3

Should you read this article? If you understand all the words in the next section, then no. If you don’t care to understand them, then also no. If you want the bolded bits explained, then yes.

The bias-variance tradeoff

“The bias-variance tradeoff” is a popular phrase you’ll hear in the context of ML/AI. If you’re a statistician, you might think it’s about summarizing this formula:

MSE = Bias² + Variance

It isn’t.

Well, it’s loosely related, but the phrase actually refers to a practical recipe for how to pick a model’s complexity sweet spot. It’s most useful when you’re tuning a regularization hyperparameter.

Illustration by the author.

Note: If you’ve never heard of the MSE, you might need a bit of help with some of the jargon. When you hit a new term you want explained in more detail, you can follow the links to my other articles where I introduce the words I’m using.

Understanding the basics

The mean squared error (MSE) is the most popular (and vanilla) choice for a model’s loss function and it tends to be the first one you’re taught. You’ll likely take a whole bunch of stats classes before it occurs to anyone to tell you that you’re welcome to minimize other loss functions if you like. (But let’s be real: parabolae are super easy to optimize. Remember d/dx ? 2x. That convenience is enough to keep most of you loyal to the MSE.)

Once you learn about the MSE, it’s usually mere moments until someone mentions the bias and variance formula:

MSE = Bias² + Variance

I did it too and, like a garden variety data science jerk, left the proof as homework for the interested reader.

Let’s make amends — if you’d like me to derive it for you while making snide comments in the margins, take a small detour to here. If you choose to skip the mathy stuff, then you’ll have to put up with my jazz hands and just take my word for it.

Positive vibes only

Want me to tell the key thing to you bluntly? Notice that the formula consists of two terms that can’t be negative.

The quantity (MSE) you’re trying to optimize when you fit your predictive ML/AI models can be decomposed into always-positive terms that involve bias only and variance only.

MSE = Bias² + Variance = (Bias)² + (Standard Deviation)²

Even more bluntly? Okay, sure.

A better model has a lower MSE. E stands for error and fewer errors are better, so the best model has a zero MSE: it makes no mistakes. That means it also has no bias and no variance.

Photo by Laura Crowe on Unsplash

Instead of perfect models, let’s look at going from good to better. If you’re truly able to improve your model (in terms of MSE), there’s no need for a tradeoff between bias and variance. If you became a better archer, you became a better archer. No tradeoff. (You probably needed more practice — data! — to get there.)

As Tolstoy would say, all perfect models are alike, but each unhappy model can be unhappy in its own way.

All perfect models are alike

As Tolstoy would say, all perfect models are alike, but each unhappy model (for a given MSE) can be unhappy in its own way. You can get two equally rubbish yet different models with the same MSE: one model can have really good (low) bias but high variance while the other can have really good (low) variance but high bias, and yet both can have the same MSE (overall score).

If we measure the performance of an archer by MSE, we’re saying that decreasing an archer’s standard deviation is worth the same as an equivalent decrease in bias. We’re saying we’re indifferent between the two. (Wait, what if you’re not indifferent between them? Then the MSE might not be your best choice here. Don’t like the MSE’s way of scoring performance? Not a problem. Make your own loss function.)

Now that we’ve set the table, head over to Part 2 where we dig into the heart of the matter: Is there an actual tradeoff? (Yes! But not where you might think.) And what does overfitting have to do with it? (Hint: everything!)

Thanks for reading! How about a YouTube course?

If you had fun here and you’re looking for an entire applied AI course designed to be fun for beginners and experts alike, here’s the one I made for your amusement:

Looking for hands-on ML/AI tutorials?

Here are some of my favorite 10 minute walkthroughs:

Data Science
Machine Learning
Statistics
Getting Started
Bias Variance Tradeoff
Recommended from ReadMedium