avatarIrfan Alghani Khalid

Summary

The web content provides a comprehensive tutorial on visualizing football data using R, specifically demonstrating how to create shots, passes, and heat maps using StatsBomb's open data and R libraries such as StatsBombR, Tiduverse, and ggsoccer.

Abstract

The article titled "How to Visualize Football Data Using R" is a guide aimed at teaching readers how to analyze and visualize football event data. It begins with an introduction to the growing field of football analytics, emphasizing the insights data can provide into the game. The author then outlines the steps to access and preprocess StatsBomb's open data, which includes event data from various leagues such as the UEFA Champions League. The tutorial covers the installation and use of specific R libraries necessary for data retrieval and visualization, including StatsBombR for data acquisition and ggsoccer for creating football pitch visualizations. The implementation section details the process of creating a pass map for Thomas Müller, a shots map for the match between Bayern Munich and Borussia Dortmund, and a pressure heat map for Bayern Munich's defensive actions. The article concludes with final remarks, encouraging readers to apply the techniques learned to their own analyses and to follow the author's Medium profile for more tutorials on football analytics and data science.

Opinions

  • The author expresses that football analytics has grown rapidly and that data allows for a deeper understanding of the game.
  • The use of StatsBomb's open data is praised for its educational value in learning football data analysis.
  • R is presented as a powerful tool for visualizing complex football data, with libraries like ggsoccer extending its capabilities for sports analytics.
  • The author demonstrates a preference for customizing visualizations to enhance aesthetic appeal and readability, as seen in the pass map and shots map examples.
  • There is an implicit endorsement of StatsBombR and ggsoccer libraries for their utility in handling and visualizing football event data.
  • The author encourages readers to explore further into football analytics by suggesting their Medium profile as a resource for additional tutorials and content.

How to Visualize Football Data Using R

Tutorials on creating shots, passes, and heat maps

Photo by Janosch Diggelmann on Unsplash

Introduction

Football analytics has grown rapidly in recent years. With data, we can understand the game from a different perspective.

In this article, I will show you how to visualize football data using R. At the end of this article, you will be able to create visualizations like this:

All images are created by the author.

Without further ado, let’s get started!

Implementation

Data source

We will use the open data from StatsBomb, which I’ve got permission to use the data as an example. StatsBomb is a football analytics company that provides event data and analytics services for football clubs.

Since the event data is different than regular tabular data, StatsBomb provides free data to help us learn more about how to analyze football using the data.

The leagues that we can choose from the open data are the UEFA Champions League, the World Cup, Indian Super League, and many more.

For this article, we’ll create visualizations based on a match from the 2012/13 Champions League final between Bayern Munich and Borussia Dortmund.

The StatsBomb data format is like a JSON file, so it will be challenging to analyze data using it. But thankfully, we can retrieve the data easily by using a function called StatsBombR.

Install and load the libraries

We need several libraries to help us create the visualization. Those libraries are:

  • StatsBombR => Retrieving the StatsBomb data
  • Tiduverse => A library that compiles libraries for preprocessing and visualizing the data
  • Ggsoccer => A library to generate the football pitch on our visualization

Let’s install the library. Here is the code for doing that:

# Install the necessary libraries
install.packages('devtools')
devtools::install_github("statsbomb/SDMTools")
devtools::install_github("statsbomb/StatsBombR")
install.packages('tidyverse')
install.packages('ggsoccer')

After that, you can load the library by running these lines of code:

# Load the libraries
library(tidyverse)
library(StatsBombR)
library(ggsoccer)

Now we’re ready to get our hands dirty.

Preprocess the data

Since StatsBomb provides open data from different leagues, we need to specify the competition ID and the season that we want to analyze.

First, we need to look at all competitions that StatsBomb provides. Here is the code for doing that:

# Retrieve all available competitions
Comp <- FreeCompetitions()

From that code, you will see a data frame comprising all leagues they cover.

As I’ve mentioned before, we’ll analyze the match between Bayern and Dortmund at the Champions League final. The corresponding competition ID and the season are 16 and 2012/2013, respectively.

Let’s filter the data by running this code below:

# Filter the competition
ucl_german <- Comp %>%
  filter(competition_id==16 & season_name=="2012/2013")

Next, we retrieve all matches that correspond to that league and season. Here is the code for doing that:

# Retrieve all available matches
matches <- FreeMatches(ucl_german)

After we get all matches, we can retrieve the event data by running this line of code:

# Retrieve the event data
events_df <- get.matchFree(matches)

And lastly, we clean the data. Here is the code for doing that:

# Preprocess the data
clean_df <- allclean(events_df)

Now we have the data. Let’s create the visualizations!

Pass map

The first one is the pass map. Passes map shows all passes created by a player or a team. In this example, we’ll create a passes map of Thomas Muller.

Let’s filter the data by taking all passes made by Muller. Here is the code for doing that:

# Passing Map
muller_pass <- clean_df %>%
  filter(player.name == 'Thomas Müller') %>%
  filter(type.name == 'Pass')

Now here comes the fun part. To create the viz, we will use a library called ggplot and ggsoccer. Here is the code for creating the basic viz:

ggplot(muller_pass) +
  annotate_pitch(dimensions = pitch_statsbomb) +
  geom_segment(aes(x=location.x, y=location.y, xend=pass.end_location.x, yend=pass.end_location.y),
               colour = "coral",
               arrow = arrow(length = unit(0.15, "cm"),
                             type = "closed")) +
  labs(title="Thomas Muller's Passing Map",
       subtitle="UEFA Champions League Final 12/13",
       caption="Data Source: StatsBomb")

If you’re an R user, I believe you have been familiar with ggplot.

The ggsoccer extends the ggplot library, so we can build a visualization on event data that comprises the start and end coordinates.

The library provides the annotate_pitch to create the football pitch and the geom_segment to create lines of passes, which you can see from the code above. And the rest is the one that you see on ggplot.

And here is the result:

The image is created by the author.

Well, it seems not visually aesthetic. Let’s add the theme function to customize the look of the viz. Here’s the complete code, along with the theme function:

ggplot(muller_pass) +
  annotate_pitch(dimensions = pitch_statsbomb, fill='#021e3f', colour='#DDDDDD') +
  geom_segment(aes(x=location.x, y=location.y, xend=pass.end_location.x, yend=pass.end_location.y),
               colour = "coral",
               arrow = arrow(length = unit(0.15, "cm"),
                             type = "closed")) + 
  labs(title="Thomas Muller's Passing Map",
       subtitle="UEFA Champions League Final 12/13",
       caption="Data Source: StatsBomb") + 
  theme(
    plot.background = element_rect(fill='#021e3f', color='#021e3f'),
    panel.background = element_rect(fill='#021e3f', color='#021e3f'),
    plot.title = element_text(hjust=0.5, vjust=0, size=14),
    plot.subtitle = element_text(hjust=0.5, vjust=0, size=8),
    plot.caption = element_text(hjust=0.5),
    text = element_text(family="Geneva", color='white'),
    panel.grid = element_blank(),
    axis.title = element_blank(),
    axis.text = element_blank()
  )

And here is the result:

The image is created by the author.

Now it looks great!

Shots map

In this section, we’ll create the shot map for both teams. But we’ll plot the shots generated by each club on different coordinates.

Therefore, we’ll create two data frames of Bayern and Dortmund’s shot data. Let’s run these lines of code:

dortmund_shot <- clean_df %>%
  filter(type.name == 'Shot') %>%
  filter(team.name == 'Borussia Dortmund') %>%
  select(player.name, location.x, location.y, shot.end_location.x, shot.end_location.y, shot.statsbomb_xg)
bayern_shot <- clean_df %>%
  filter(type.name == 'Shot') %>%
  filter(team.name == 'Bayern Munich') %>%
  select(player.name, location.x, location.y, shot.end_location.x, shot.end_location.y, shot.statsbomb_xg)

Creating the shot map is also similar to creating the pass map. We change the geom_segment function with the geom_point function. If you know, that’s the function for plotting the scatter plot.

We apply the function to each data frame. And for Dortmund’s shot data, we reflect the x coordinates by subtracting the value by 120.

Have a look at this code:

ggplot() +
  annotate_pitch(dimensions = pitch_statsbomb, colour='white', fill='#021e3f') +
  geom_point(data=dortmund_shot, aes(x=location.x, y=location.y, size=shot.statsbomb_xg), color="red") +
  geom_point(data=bayern_shot, aes(x=120-location.x, y=location.y, size=shot.statsbomb_xg), color="yellow") +
  labs(
    title="Borussia Dortmund vs Bayern Munich",
    subtitle = "Shots Map | UEFA Champions League Final 2012/2013",
    caption="Data Source: StatsBomb"
  ) + 
  theme(
    plot.background = element_rect(fill='#021e3f', color='#021e3f'),
    panel.background = element_rect(fill='#021e3f', color='#021e3f'),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    text = element_text(family="Geneva", color='white'),
    plot.title = element_text(hjust=0.5, vjust=0, size=14),
    plot.subtitle = element_text(hjust=0.5, vjust=0, size=8),
    plot.caption = element_text(hjust=0.5),
    plot.margin = margin(2, 2, 2, 2),
    legend.position = "none"
  )

Let’s see the result from the code:

The image is created by the author.

Pressures heat map

Lastly, we’ll create the pressure heat map conducted by Bayern Munich. Let’s filter the data by using these lines of code:

# Pressure Heat Map
bayern_pressure <- clean_df %>%
  filter(team.name == 'Bayern Munich') %>%
  filter(type.name == 'Pressure')

To generate the viz, it’s similar to the previous one. Please have a look at this code:

ggplot(bayern_pressure) +
  annotate_pitch(dimensions = pitch_statsbomb, fill='#021e3f', colour='#DDDDDD') +
  geom_density2d_filled(aes(location.x, location.y, fill=..level..), alpha=0.4, contour_var='ndensity') +
  scale_x_continuous(c(0, 120)) +
  scale_y_continuous(c(0, 80)) +
  labs(title="Bayern Munich's Pressure Heat Map",
       subtitle="UEFA Champions League Final 12/13",
       caption="Data Source: StatsBomb") + 
  theme_minimal() +
  theme(
    plot.background = element_rect(fill='#021e3f', color='#021e3f'),
    panel.background = element_rect(fill='#021e3f', color='#021e3f'),
    plot.title = element_text(hjust=0.5, vjust=0, size=14),
    plot.subtitle = element_text(hjust=0.5, vjust=0, size=8),
    plot.caption = element_text(hjust=0.5),
    text = element_text(family="Geneva", color='white'),
    panel.grid = element_blank(),
    axis.title = element_blank(),
    axis.text = element_blank(),
    legend.position = "none"
  )

We change the geom_point function with the geom_density2d_filled to generate the heat map. Also, we add the scale function to specify the heat map range.

Here’s the result of the code:

The image is created by the author.

Final Remarks

Well done! You have learned how to visualize football data using R. We have created passes, shots, and a pressure heat map.

I hope you can learn lots of stuff here. And also, please apply the knowledge to your favorite teams.

If you are interested in this article, please look at my Medium profile for more football analytics and data science-related tutorials.

Thank you for reading my article!

Data Science
Programming
Data Analytics
Sports
Data Visualization
Recommended from ReadMedium