Summary

This article suggests using machine learning to identify weaknesses in betting odds rather than predicting game outcomes to beat the odds.

Abstract

The article titled "How to beat the odds in betting using machine learning" discusses a different approach to using machine learning for predicting sports outcomes. Instead of trying to predict the winner, the author suggests focusing on identifying situations where the bookie's odds are weak. By analyzing the cross-entropy loss, one can find where the odds are least accurate and train a model to improve predictions in those situations. The author emphasizes that this approach does not require more data or computing power but rather leverages the fact that the bookie's predictions are available for inspection. The article concludes by advising the reader to use this to their advantage and wishing them good luck.

Bullet points

The traditional approach to using machine learning for betting involves predicting the winner, but this is unlikely to beat the bookie's odds.
Instead, the author suggests identifying situations where the bookie's odds are weak.
Analyzing the cross-entropy loss can help find where the odds are least accurate.
This approach does not require more data or computing power but rather leverages the fact that the bookie's predictions are available for inspection.
The author advises using this to the reader's advantage and concludes by wishing them good luck.

How to beat the odds in betting using machine learning

I’ve read many articles about people describing their attempts to produce a machine learning model to predict the winning team, the score, or something similar. Usually, the process is as follows:

Basic information is scraped or downloaded from an existing dataset
Features are extracted
A model is trained

The problem

The betting sites have more computing power than you, they have more data, and they have better data. Do you expect your model using basic features to win? It’s unlikely.

Alright… then should we give up? Not necessarily… I believe there is a better way of approaching the problem:

You should not just try to predict who will win/score/etc., you should predict when the odds are good/bad
You should not use all data. You won’t beat the odds with all data and you don’t have to either. All you have to do is find one situation where you are better than the bookie.

An example

Imagine you are training a machine learning model to classify animal images for a competition. You discover that your model finds it difficult to predict the difference between bats and rats. This means that there will be a larger uncertainty in these predictions. You deploy your model as an API and wait for the final evaluation.

Now imagine your challenger would like to cheat. She finds your model API, sends a bunch of images of different animals to it, and then analyzes where your predictions are wrong. She finds the case of bats and rats. Thereafter, she trains her own model only on bats vs. rats and finally changes the evaluation data to only bats and rats.

Likely she will win. Why? Because she only has to train a model on a smaller subset of animals while you have to train a model for all animals, a more difficult task.

This is the same scenario. You are the bookie and she is the challenger trying to beat the odds. The odds are actually predictions, just defined in an odds format. Thus, you could analyze how well bookies predict the games (animals) using their odds (predictions) and if you find a weakness, train a model to only bet on that subset of games (bats and rats). Additionally, if you have a more narrow subset you could find additional features only applicable to that subset, thus obtaining better data, evening the playing field.

How do you find weaknesses?

While you could look at the accuracy, this doesn’t make sense since the odds are probabilities. Instead, you should look at the cross-entropy loss. Find where the loss is large, then see if you can develop a model that could improve the predictions (odds) in these situations. If you find niched data/features to a certain situation it is not unimaginable that your model could actually beat the odds. I would expect that the more often the situation appears the more likely the model of the bookie will be good at predicting it.

Another way of looking at it

Another way of seeing the problem is to look at it from a search perspective. Instead of finding a model to predict the odds or the results, find scenarios where it is profitable to bet a certain way and then evaluate whether this is true using historical data. For example, predict one team to score in the last minutes if the away team has a red card and is currently winning while the home team very often scores in the late minutes of the game. Perhaps the bookie always gives too high of an odds here. Now, this is a very rare event of course, but you get the idea. Be careful though, just because something worked for a number of games doesn’t mean it always will work. It is easy to overfit.

Conclusion

My advice is to use what is to your advantage. It is not more data and it is not more computing power. It is the fact that you have available the predictions of your challenger, free to inspect for weaknesses and that you can decide the subset of data where the competition will be evaluated. Good luck!

If you would like to sign up for a Medium membership (for unlimited stories) while also supporting me, you can use my referral link. Then I’ll receive a commission (but the price remains the same for you!). Have a nice day.

Join Medium with my referral link — Jacob Ferus

Read every story from Jacob Ferus (and thousands of other writers on Medium). Your membership fee directly supports…

medium.com