Interactive basketball data visualizations with Plotly
Analyze sports data with hexbin shot charts and bubble charts with Plotly and Plotly Express (source code & my own data for all 30 teams included in my GitLab repo)

This post is mostly about visualisation. It is only going to include very cursory basketball info — basically, if you know what it is, you’re going to be fine.
I have been tinkering with basketball data analysis & visualisation recently, using matplotlib for plotting. It is powerful, but Plotly is my usual visualisation package of choice.
In my opinion, Plotly achieves the right balance of power and customisability, written in sensible, intuitive syntax with great documentation and development rate.
So I recently migrated my basketball visualisation scripts to Plotly, with great results. In this article, I would like to share some of that, including examples using both Plotly and Plotly Express.
I included the code for this in my GitLab repo here (basketball_plots directory), so please feel free to download it and play with it / improve upon it.
Before we get started
Data
As this article is mainly about visualization, I will include my pre-processed data outputs in my repo for all 30 teams & league average.
Packages
I assume you’re familiar with python. Even if you’re relatively new, this tutorial shouldn’t be too tricky, though.
You’ll need plotly. Install it (in your virtual environment) with a simple pip install plotly.
Bubble charts in a blink — with Plotly Express
Professional basketball players in the NBA take shots from right at the rim, to past the three-point line, which is about 24 feet away from it. I wanted to understand how the distance affects accuracy of shots, and how often players shoot from each distance, and see if there is a pattern.
This is a relatively straightforward exercise, so let’s use Plotly Express. Plotly Express is a fairly new package, and is all about producing charts more quickly and efficiently, so you can focus on the data exploration. (You can read more about it here)
I have a database of all shot locations for an entire season (2018–2019 season) of shots, which is about 220,000 shots. The database includes locations of each shot, and whether it was successful (made) or not.
Using this data, I produced a summary (league_shots_by_dist.csv), which includes shots grouped by distance in 1-foot bins (up to 32 feet), and columns for shots_made, shots_counts, shots_acc and avg_dist.
So let’s explore this data — simply load the data with:
import pandas as pd
grouped_shots_df = pd.read_csv('srcdata/league_shots_by_dist.csv')And then after importing the package, running just the two lines of code below will magically open an interactive scatter chart on your browser.
import plotly.express as px
fig = px.scatter(grouped_shots_df, x="avg_dist", y="shots_accuracy")
fig.show()
To see how often players shoot from each distance, let’s add the frequency data: simply pass shots_counts value to the ‘size' parameter, and specify a maximum bubble size.
fig = px.scatter(grouped_shots_df, x="avg_dist", y="shots_accuracy", size="shots_counts", size_max=25)
fig.show()
That’s intersting. The frequency (bubble size) decreases, and then picks back up again. Why is that? Well, we know that as we get farther, some of these are two point shots, some are three pointers, and some are a mix of the two. So let’s try colouring the variables by the shot type.
fig = px.scatter(grouped_shots_df, x="avg_dist", y="shots_accuracy", size="shots_counts", color='shot_type', size_max=25)
fig.show()
Ah, there it is. It looks like the shot frequency increases as players try to take advantage of the three point line.
Edit: here’s a live demo
Try moving your mouse over each point — you will be pleasantly rewarded with a text tooltip! You didn’t even have to set anything up.

Moving your cursor and looking at the individual points, the data tells us that shot accuracy doesn’t change greatly past 5 to 10 feet from the basket. By shooting threes, the players are trading off about a 15% decrease in shot accuracy for a 50% more reward of a three pointer. It makes sense that three pointers are more popular than these ‘mid-range’ two pointers.
That’s not exactly a groundbreaking conclusion, but it’s still neat to be able to see it for ourselves.
But more importantly, wasn’t that insanely simple? We created the last chart with just two lines of code!
As long as you have a ‘tidy’ dateframe that has been pre-processed, Plotly Express allows fast visualisations like this, which you can work from. It’s a fantastic tool for data exploration.
Hexbin plots, with Plotly
Let’s move onto another chart, called hexbin charts. I’ve discussed it elsewhere, but hexbin charts allow area-based visualisation of data, by dividing an entire area into hexagon-sized grids and displaying data by their colour (and also sometimes size) like so.

While Plotly does not natively provide functions to compile hexbin data from coordinate-based data points, it does not matter for us because a) matplotlib does (read about the ed by matplotlib here
Okay, so let’s move straight onto visualisation of the hexbin data:
Our first shot chart
I have saved the data in a dictionary format. Simply load the data with:
import pickle
with open('srcdata/league_hexbin_stats.pickle', 'rb') as f:
league_hexbin_stats = pickle.load(f)The dictionary contains these keys (see for yourself with print(league_hexbin_stats.keys()):
['xlocs', 'ylocs', 'shots_by_hex', 'freq_by_hex', 'accs_by_hex', 'shot_ev_by_hex', 'gridsize', 'n_shots']The important ones are: x & y location data xlocs, ylocs, frequency data freq_by_hex and accuracy data accs_by_hex. Each of these include data from each hexagon, except for accuracy data, which I have averaged into “zones” to smooth out local variations. You’ll know what I mean when you see the plots.
Note: The X/Y data are as captured originally, according to the standard coordinate system. I base everything else from it. Basically, the centre of the rim is at (0, 0) and 1 in X & Y coordinates appears to be 1/10th of a foot.
Let’s plot those. This time we will use plotly.graph_objects, for the extra flexibility it gives us. It leads to writing slightly longer code, but don’t worry — it’s still not very long, and it will be totally worth it.














