avatarMatt Przybyla

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3218

Abstract

lized by a factor of 0.80. Whereas, vice-versa can be said about over-predictions, which will be penalized by a factor of 0.20. Therefore, over-predictions will be penalized less than under-predictions. In this case, we would over-predict 80% of the time.</p><p id="d021" type="7">This can be especially useful if your observation/actual count lies above the median more frequently.</p><p id="33fc">Now let’s dive into the fun part — when this function is actually useful for your business or for any academic use case as well.</p><p id="c09b">Let’s say we have the same example above of a range for 0–100 actual observations. If the median is 50, but more actuals fall above 50, say 60–80 than they do for 20–40, then we should use a higher quantile alpha value. You can test different alphas, but you would want to start with anything above 0.50, or else you are defeating the purpose of quantile loss and actually using MAE in this case.</p><blockquote id="9795"><p>To drive the point further, let’s summarize two simple use cases that can represent pretty much any decision you will make with quantile:</p></blockquote><ul><li>Use case # 1:</li></ul><p id="0859"><i>Predict the airplane price for a long-flight trip.</i></p><p id="1c4d">As you can see, we already want to penalize underpredictions so we will choose an overprediction quantile of 0.50+, you could start with 0.55, 0.60, and so on. It might be a good idea to still test 0.50 as a baseline comparison. It is likely that your <b>data is skewed to the left</b>, which you should check for, and that it is better to overpredict because prices in the past have typically been closer to the maximum range than the minimum range. For example, we would not expect a long-flight to be 10 more often than not(<i>even with a minimum observed of 10</i>), and instead expect it to be closer to something like $200 for example.</p><ul><li>Use case # 2:</li></ul><p id="c9f5"><i>Predict the rain amount in a dry-area in Summer.</i></p><p id="4473">If we are in a more dry region, anywhere, for example, and it's Summer, but we want to predict the rainfall for a certain day, we might expect our actuals to be pretty low in reference to our range max, which does contain some thunderstorms. In this case, we might want to use an alpha of 0.45 or lower, etc., because we see that our count of rows where rainfall is low is more frequent, so we want to under-predict rain.</p><h1 id="7bb4">Summary</h1><figure id="585a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*wbIGZSK3uw0V7Cz_"><figcaption>Photo by <a href="https://unsplash.com/@edwardhowellphotography?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Edward Howell</a> on <a href="https://unsplash.com/s/photos/graph?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a> [3].</figcaption></figure><p id="044c">As you can see, there isn’t a one-size-fits-all approach to loss functions. It really depends on the following:</p><div id="8ced"><pre>* <span class="hljs-meta">Data</span></pre></div><div id="0928"><pre>* <span class="hljs-type">Distribution</span> <span class="hljs-keyword">of</span> that <span class="hljs-class"

Options

<span class="hljs-keyword">data</span></span></pre></div><div id="d956"><pre>* Business <span class="hljs-keyword">case</span></pre></div><div id="7e6a"><pre>* And how predictions will affect <span class="hljs-keyword">the</span> business, <span class="hljs-keyword">is</span> <span class="hljs-keyword">it</span> better <span class="hljs-keyword">to</span> overpredict <span class="hljs-keyword">or</span> underpredict? Sometimes, <span class="hljs-keyword">it</span> can even be more straightforward <span class="hljs-keyword">where</span> you want one <span class="hljs-keyword">or</span> <span class="hljs-keyword">the</span> other regardless - <span class="hljs-keyword">without</span> focusing <span class="hljs-keyword">on</span> <span class="hljs-keyword">error</span> itself, <span class="hljs-keyword">but</span> focusing <span class="hljs-keyword">on</span> tuning smaller <span class="hljs-keyword">or</span> larger predictions <span class="hljs-keyword">for</span> any reason</pre></div><p id="cf5c">I hope you found my article both interesting and useful. Please feel free to comment down below if you agree or disagree with using one loss function over the other. Why or why not? What other loss functions do you think should be discussed more? These can certainly be clarified even further, but I hope I was able to shed some light on data science loss functions and their applications.</p><p id="4fd5"><b><i>I am not affiliated with any of these companies.</i></b></p><p id="a0c1"><i>Please feel free to check out my profile, <a href="undefined">Matt Przybyla</a></i>, <i>and other articles, as well as subscribe to receive email notifications for my blogs by following the link below, or by <b>clicking on the subscribe icon on the top of the screen by the follow icon</b>, and reach out to me on LinkedIn if you have any questions or comments.</i></p><p id="015a"><b>Subscribe link:</b> <a href="https://datascience2.medium.com/subscribe">https://datascience2.medium.com/subscribe</a></p><p id="4f20"><b>Referral link:</b> <a href="https://datascience2.medium.com/membership">https://datascience2.medium.com/membership</a></p><p id="7194">(<i>I will receive a commission if you sign up for a membership on Medium</i>)</p><h1 id="bf5c">References</h1><p id="44da">[1] Photo by <a href="https://unsplash.com/@josephyip?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Joseph Yip</a> on <a href="https://unsplash.com/s/photos/%25?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a>, (2021)</p><p id="597c">[2] Photo by <a href="https://unsplash.com/@nampoh?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Maxim Hopman</a> on <a href="https://unsplash.com/s/photos/graph?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a>, (2021)</p><p id="cf8a">[3] Photo by <a href="https://unsplash.com/@edwardhowellphotography?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Edward Howell</a> on <a href="https://unsplash.com/s/photos/graph?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a>, (2020)</p></article></body>

Opinion

Stop Using MAE and Use This Data Science Loss Function Instead

A look into quantile loss and when to use it — business use cases

Photo by Joseph Yip on Unsplash [1].

Table of Contents

  1. Introduction
  2. Regression and Loss Functions
  3. When To Use Quantile and When
  4. Summary
  5. References

Introduction

As a data scientist who has learned a lot online, I have seen a lack of discussions around loss functions that aren’t MAE or RMSE. For this reason, I am going to give a quick summary of when to use a different loss function, which is the mighty quantile loss function, and its variations. This discussion will be useful for data scientists who have not heard of this function as much, in addition to those who want to learn more about when to use the quantile loss function. With that being said, let's look at some of the what, the why, and the when of quantiles, specifically business use cases.

Regression and Loss Functions

To get started, let’s get our bearings first before diving into business use cases. We can use the quantile loss function for regression problems, which I will discuss in this article. Regression is a type of algorithm that predicts a continuous variable. For example, if we wanted to predict a value that is in the range of 0 to 100.

Here are examples of other loss functions that are often applied to regression algorithms:

  • MAE optimizes for the median (Mean Absolute Value) without a focus on directional optimization — hence the ‘absolute’ part
  • RMSE optimizes for outliers (Root Mean Square Error) —penalizes larger errors

So, you can use MAE if your data is more normally distributed and doesn’t have outliers, while you can use RMSE if you have outliers in your data and large errors are especially painful for your use case.

Now that we know what typical loss functions look like, we can look at quantile.

When To Use Quantile and When

Photo by Maxim Hopman on Unsplash [2].

The term quantile is another way of saying percentile but in fractional form. Furthermore, if the quantile value is 0.80, for example, then we can say that under-predictions will be penalized by a factor of 0.80. Whereas, vice-versa can be said about over-predictions, which will be penalized by a factor of 0.20. Therefore, over-predictions will be penalized less than under-predictions. In this case, we would over-predict 80% of the time.

This can be especially useful if your observation/actual count lies above the median more frequently.

Now let’s dive into the fun part — when this function is actually useful for your business or for any academic use case as well.

Let’s say we have the same example above of a range for 0–100 actual observations. If the median is 50, but more actuals fall above 50, say 60–80 than they do for 20–40, then we should use a higher quantile alpha value. You can test different alphas, but you would want to start with anything above 0.50, or else you are defeating the purpose of quantile loss and actually using MAE in this case.

To drive the point further, let’s summarize two simple use cases that can represent pretty much any decision you will make with quantile:

  • Use case # 1:

Predict the airplane price for a long-flight trip.

As you can see, we already want to penalize underpredictions so we will choose an overprediction quantile of 0.50+, you could start with 0.55, 0.60, and so on. It might be a good idea to still test 0.50 as a baseline comparison. It is likely that your data is skewed to the left, which you should check for, and that it is better to overpredict because prices in the past have typically been closer to the maximum range than the minimum range. For example, we would not expect a long-flight to be $10 more often than not(even with a minimum observed of $10), and instead expect it to be closer to something like $200 for example.

  • Use case # 2:

Predict the rain amount in a dry-area in Summer.

If we are in a more dry region, anywhere, for example, and it's Summer, but we want to predict the rainfall for a certain day, we might expect our actuals to be pretty low in reference to our range max, which does contain some thunderstorms. In this case, we might want to use an alpha of 0.45 or lower, etc., because we see that our count of rows where rainfall is low is more frequent, so we want to under-predict rain.

Summary

Photo by Edward Howell on Unsplash [3].

As you can see, there isn’t a one-size-fits-all approach to loss functions. It really depends on the following:

* Data
* Distribution of that data
* Business case
* And how predictions will affect the business, is it better to overpredict or underpredict? Sometimes, it can even be more straightforward where you want one or the other regardless - without focusing on error itself, but focusing on tuning smaller or larger predictions for any reason

I hope you found my article both interesting and useful. Please feel free to comment down below if you agree or disagree with using one loss function over the other. Why or why not? What other loss functions do you think should be discussed more? These can certainly be clarified even further, but I hope I was able to shed some light on data science loss functions and their applications.

I am not affiliated with any of these companies.

Please feel free to check out my profile, Matt Przybyla, and other articles, as well as subscribe to receive email notifications for my blogs by following the link below, or by clicking on the subscribe icon on the top of the screen by the follow icon, and reach out to me on LinkedIn if you have any questions or comments.

Subscribe link: https://datascience2.medium.com/subscribe

Referral link: https://datascience2.medium.com/membership

(I will receive a commission if you sign up for a membership on Medium)

References

[1] Photo by Joseph Yip on Unsplash, (2021)

[2] Photo by Maxim Hopman on Unsplash, (2021)

[3] Photo by Edward Howell on Unsplash, (2020)

Data Science
Machine Learning
Artificial Intelligence
Algorithms
Technology
Recommended from ReadMedium