How to Pick your Fantasy Football Team Using Data

Check out the website I build to display all the statistics:
Listen to the data, it’s usually right.
I crunched the numbers on Fantasy Premier League (FPL) this season so you don’t have to.
If you haven’t read my last post on this topic, then please go ahead. It shows how I used similar methods to get into the top 1% on the return to FPL after lockdown. If you’re just looking for a cheat sheet, then look here.
I will use very simple maths and a little bit of coding (Python) to help find the ultimate FPL starting team for this season.
The theory
The theory is the same as last time: Maximum ROI = Maximum points

Therefore, the approach is to create a team with the maximum ROI possible. Simple, right?
Tactics
There are obviously a few more things to consider than just finding the team with the best ROI, but this is the main objective. Even with this approach, there are opportunities to improve. If you look at the best players from the previous season they also tend to have the best ROI’s. However, this does not mean that these players are the best players to have every week. There are some weeks where these players have tough fixtures and will underperform.
In the long run these players will be optimal if no changes were allowed, that doesn't mean we have to stick with them when they are not expected to have a good week. On average, they will be the best players every week, but that does not mean they are on every individual week.
Thankfully we have 15 players when we include our bench that we can rotate.
Therefore, my approach is to attempt to keep the 15 highest ROI players (within the constraints of the game) in the squad at all times and rotate who starts depending on fixtures to take advantage of the peaks and troughs in the player's performance due to fixtures and injuries.
Realistically this will look more like something along the lines of trimming the pool down to say, 25, players who are options and the rest won't even be looked at. By comparing their ROI, trends in ROI, and upcoming fixtures I will be rotating them in and out of the squad as well as in and out of the starting 11.
Therefore, we need to look at the fixtures...
Upcoming fixtures
To help guide me with upcoming fixtures and try and avoid any personal bias I scraped some expected goals and clean sheet data from online. If you don’t know what expected goals are, they are really useful and a fantastic tutorial on how to calculate them on python is here. I plan on making an EG model soon but I don’t have the time or the data currently.
The website I used does not specifically state the exact equation they use to calculate EG and ECS but I’ve personally noticed it to be reasonably accurate in the past. Understat may be a better option for this in the future but they have not updated their fixtures for next season yet.
I, therefore, transformed the data into two easy to read plots. One for expected goals (EG) and one for expected clean sheets (ECS), both with a 1 week, 3 weeks and 6 weeks cumulative total. The first six weeks can be seen below — a gift from me to you. This will be performance adjusted and will change each week so I plan on referring to this each week before making transfers or substitutions.


As can be seen, on the whole, usual service is resumed with the big six congregating near the left on both plots. There is, however, a few rather large surprises. Firstly, Chelsea sits 13th in terms of expected clean sheets and Wolves sit first. Wolves do not just sit first, they are miles ahead of everyone else in both the first week, the 3 week total and the 6 week total. They seem like a solid bet defensively for the first six weeks at least especially at the price of their defenders.
It can also be seen that despite Chelsea’s woeful expected clean sheets score they have a promising expected goals column. They are second behind only Liverpool. However, they are closely followed by Manchester United and City who are likely to overtake them after the first week (which they do not play in).
I’ve written this script such that It will download the up to date data and plot the same graphs as soon as the data is refreshed on the source site. Just reach out if you would like access to it.
Goalkeepers
The main bulk of my exploratory analysis was done by one simple plotting function that I wrote. I could specify within the function a few characteristics about a player I was looking for and it would feed me the best players.
For example, I could specify minimum minutes played last season, the position, target ROI and a price ceiling amongst other things. Again, it’s easily written but the code is available on request. Therefore, when looking at keepers it would feed out something exactly like this.

I would run it firstly with no target ROI to see the whole field and then make a cut off ROI such that I would be left with the top 20% or so. As can be seen, ROI is along the x-axis and price along the y-axis. The size of the circle also represents total points and the colour represents whether or not the player is over or under the specified price ceiling (here it is set to 5).
Therefore, straightforwardly we want to have players who are as far right as possible. However, as we have seen ROI tends to be correlated with price. Therefore, just picking all the players as far right as possible will most likely exceed the budget swiftly. Consequently, we also want to pick some players with lower prices who also still have high ROI. Basically, we want to get as many players from the right-hand side as possible without breaking the budget and players who are really low down and also far right can help us do that.
We have to pick two goalkeepers for the squad and as can be seen, from the above logic, the choice is pretty simple. Nick Pope is by far the best goalkeeper of last season. He has the most points and highest ROI so he is an easy choice for starting goalkeeper. A backup goalkeeper is an easy option too. Matt Ryan has the second-highest ROI (of course helped out by his lower price) and also a low price. Therefore, he is a great choice as back up and can be substituted when Pope plays against the high scoring teams.
Captains and Vice Captains
If you read my last post you’ll remember one of the drawbacks of the ROI model is that captains and vice-captains receive double points. Therefore, it is important to have some players in the team that are the big hitters — the players who are going to score a lot of points. I wanted to add these players but I also didn't want to sacrifice my model based on high ROI so I investigated if there was any crossover in players who were both the top scorers and also top in terms of ROI.
With a simple list comprehension, I found there were four players who fitted the criteria of being in the top ten of points scorers of last season as well as top ten ROI. Those players were Kevin De Bruyne, Anthony Martial, Danny Ings and Raul Jiminez. Their total points also descended in that same order. KDB, in terms of total points and ROI was significantly ahead of the other three. He was so far ahead in terms of ROI that it would almost be insane not to pick him. I would suggest choosing him and at least one other as your rotating captain choices for the first few weeks.
The only issue is, that he doesn’t play the first week and neither does Anthony Martial.
I’ll come back to this later.
Defenders
Now we have the Goalkeepers and big hitters out of the way it’s time to look at the real backbone of the team, the defence. In terms of overall ROI, defenders have significantly higher ROI than both midfielders and attackers. I’ll jump straight into it and show you the output graph.

Again, I feel that anyone on this graph is a valid choice. Trent Alexander-Arnold is actually the best player in the game in terms of ROI but is obviously expensive at 7.5m. Matt Doherty is also a great choice but has moved teams so you don’t know how he will perform.
Overall, the tactic is the same. Chose players from this graph who are as far right as possible within budget. Consideration for the fixtures must also not be forgotten. Connor Coady may not on paper be as good as Basham, Dunk or Egan, but with that Wolves expected clean sheet bar being so high be is probably a good choice.
Midfielders
Again, the midfielders follow the exact same procedure. But be careful with this one, there is an outlier…

John ‘the lord’ Lundstram looks like a clear winner, however, he probably isn’t.
He was positioned as a defender last season and therefore picked up loads of clean sheer bonus points which he will not receive this year. We can also see that KDB is the best of the rest in terms of ROI and total points. There is also some great high ROI, cheap options such as Westwood, Moutinho and Rice. Again, be careful of players who are not playing in the first week (I’ll separate these players later and make it easy for you).
Forwards
And finally, the forwards. The forward options are slightly lacking this year. Some of the last year's favourites have been moved to midfield and there weren’t that many good forwards before.
But anyway, here’s what you’re after…

As can be seen three of the players we discussed earlier are leading the pack; Ings, Jimenez and Martial. So these are the obvious choices, they are robust captain choices and have the highest ROI. Jordan Ayew is also a solid choice at that price. If you can afford it he would be a great compliment to two of the three previously mentioned.
Hidden Gems
The ROI model does have one main caveat, it doesn’t pick up players who have not played for a significant part of the season. This could be either through injury or joining in the January transfer window. Therefore, I built the same plotting function to instead plot ROM (return on minutes played).

As can be seen, a lot of this graph is filled with players who have played under five games and have barely gathered any points. However, there are two players who have gathered significant points: Bruno Fernandes and Riyad Mahrez. Bruno, of course, joined in January and Mahrez is usually the first victim of Pep’s wildly spontaneous rotation.
Bruno has played almost every minute since joining in January and is likely to continue doing so. Bruno accumulated 117 points in 14 games. Transferring this to a 38 game season he would have hit 317 points and an ROI that would have blown everyone else out of the water. This is obviously very idealistic, but even if he played 30 games he would be likely paralleling KDB at his current rate. If Mahrez was also to receive more game time he could be a fantastic choice.
Caveats
There are obviously new teams and new players in the premier league this year which makes it difficult as there is no past data on them. My thinking on this was to leave them out to begin with and attempt to pick them up quickly if it’s noticed certain teams/players are performing well.
I have a similar approach to new players coming into the big teams such as Werner and Havertz. They may not start at the beginning of the season and simply may not perform. I feel that it’s better to let the dust settle before bringing them in. It will be clear soon enough if they are going to be worth it.
I plan on adding the new data each week to the old data and calculating fresh ROI each week while slowly sifting out past data with a weighted average function that diminishes the importance of previous gameweeks as time goes on. I feel that this is the best way to balance form and underlying ability.
Summary
Therefore, my suggestions are as follows :
(Bold and italic players are not playing the first week):
Goalkeepers: Pope and Ryan are a great combination. This is the best combination, all day long. Just pick them. There are no other options.
Defenders: Trent AA, Doherty, Virgil VD, Egan, Tarkowski, Stevens, Baldock, Robertson, Dunk, Basham, Coady, Evans, and Saiss.
Midfielders: KDB, Westwood, Grealish, Jordan Henderson, Rice, Noble, Moutinho, McNeil, Bruno Fernandes, and Jorginho
Forwards: Ings, Jimenez, Martial, Ayew, Vardy, and Wood
These numbers should be used as a helping hand and not act as a definitive rule. Despite being the best player in the game, I am considering not even picking Trent AA until around gameweek 3 as Liverpool’s expected clean sheets are actually less than teams like Southampton. The best way to go about it is to look at these shortlists of players and make choices by comparing these players to their teams expected goals and clean sheets.
This tactic got me into the top 1% post lockdown and more points than anyone in all my leagues, so keep the faith and good luck!
If you enjoyed this then please check out my other two articles on the subject.
- Fantasy Football — Part 1 — How to Win With Data
- Fantasy Football — Part 3 — the Ultimate Cheat Sheet.
Cheers,
James






