Summary

This article provides a beginner's guide to analyzing Formula 1 data using Python, including setting up a Python environment, using virtual environments, installing necessary packages like fastf1 and notebook, and getting started with data analysis in Jupyter Notebooks.

Abstract

The web content presents a comprehensive tutorial for beginners interested in analyzing Formula 1 data with Python. It starts by guiding readers through setting up a Python environment, emphasizing the importance of using a virtual environment to avoid conflicts between projects. The article then walks through the installation of essential Python packages, such as fastf1 for data collection and notebook for interactive data analysis in Jupyter Notebooks. The author explains how to cache data for faster loading times, load session data, and analyze lap times, providing code examples and encouraging readers to explore the data. The tutorial aims to demystify the process for absolute beginners and provides additional resources for further learning, including the author's other tutorials and the fastf1 library documentation. The article concludes by promoting a culture of openness and collaboration in learning and data analysis, inviting readers to share their progress and engage with the author on social media.

Opinions

The author believes in the importance of using virtual environments in Python to maintain project isolation and prevent conflicts.
They advocate for the use of Jupyter Notebooks for data analysis due to their convenience in running code and exploring outputs interactively.
The author suggests that even imperfect work should be shared to foster learning and improvement, encouraging a collaborative environment among readers and data analysts.
They recommend their own tutorials and Twitter account as resources for further learning and engagement in the Formula 1 data analysis community.
The article endorses the fastf1 library as a tool for collecting Formula 1 data, highlighting its caching functionality to enhance data loading efficiency.

How to Analyze Formula 1 Data with Python: A Beginner’s Tutorial

You want to analyze Formula 1 data, but you really don’t know how to get started? Then this guide is made exactly for you.

This tutorial will get you started with everything need to go analyze Formula 1 data yourself. It’ll show you through the basics of setting up your Python environment and help you to set up the basis of your analysis. This tutorial will also provide you with resources, explanations and tips throughout, so stay tuned!

Setting everyting up

Assuming you already Python installed, we start by creating the directory in which we will start working. You can do this in any preferred location on your computer (personally, I’m on MacOS and I’ll put it in a Documents subfolder). Mine will be called formula1_python .

Virtual Environment

It’s good practice to always work in a virtual environment when working in Python. This makes sure that the entire Python environment you use for a specific project is isolated from other projects. This makes sure that no conflicts between scripts, libraries and projects arise. To create a virtual environment on MacOS (if you’re on Windows, read this), we open Terminal and navigate to our current folder.

cd ~/path/to/formula1_python

We then want to pip install virtual environments for Python, which we do as follows:

pip install virtualenv

Now we are in the folder we want to, and we have installed the virtualenv package, we can actually create the virtual environment:

virtualenv venv

This creates a virtual environment that is called “venv”. If you look in the folder, you’ll find a folder that’s called “venv”. Only thing remaining is to activate the virtual environment, making sure that all we do from that moment onwards is isolated from the rest of the Python environments on your machine.

source venv/bin/activate

Now, you’ll probably see something like (venv) at the beginning of your command line, meaning that you successfully activated the virtual environment!

Installing requirements

We obviously want to install fastf1. This library allows us to collect all the Formula 1 data we need. So, we do the following:

pip install fastf1

In addition, Jupyter Notebooks are really convenient for playing around with data. You can run all the code line-by-line, and directly view and explore the output of your code. This makes doing analyses very convenient. We therefore run the following in the command line:

pip install notebook

If you need any other package, just run pip install [package-name] to install it into your virtual environment.

Creating a notebook

Now, we can get started with our actual analysis. First of all, let’s launch our Jupyter notebook:

jupyter-notebook

Your the Notebook server should have been started and the file browser should have been launched in your browser. To create a new notebook, click “New” in the top-right corner and select “Python”.

Getting started with Fastf1

Everything has been set up, so let me now explain in detail how the fastf1 library works. Fastf1 also has its own documentation, but if you’re an absolute beginner, you still might end up confused.

Let’s begin with the absolute basics: importing the libraries we need. For now, it’s only the fastf1 library and pandas. You can create new cells by pressing “B” on your keyboard, and run cells by doing shift + enter.

1. Caching

Every weekend a huge amount of data is being generated, which takes time to load. Fastf1 therefore provided a caching functionality that stores the data from a race weekend in a certain folder, so that the next time the data is being loaded, it goes much faster. To do so, we create a folder called ‘cache’, and then we enable the caching.

2. Loading session data

To load the data from a certain session, we need to specify three parameters: the year, the Grand Prix and the session. Let’s say we’re interested in the Qualifying of the 2021 Turkish Grand Prix. We would do the following:

The fastf1 library can correctly identify the Grand Prix based on different types of input. Instead of “Turkey”, we could for example also have said “Istanbul”, “Istanbul Park” or “16” (it was the 16th race of the season).

Now we’ve defined the session, we can load the laps. This will run for a few seconds (next time will be faster since it will collect the data from the cache).

I highly recommend you to inspect the laps variable so see what we’re dealing with. You can do so by creating a new cell, putting laps in it and running shift + enter.

3. Analyzing the data

Here, this tutorial ends. Every analysis is different and require different approaches. The purpose of this (short) tutorial was to help you get started if you really don’t know where to start. Now that you know how to get the lap data in front of you, it’s time to go out there and analyze it yourself!

I have created multiple tutorials that show you exactly how to do some analyses, so make sure to check out my profile! These can give you inspiration and provide you with examples. Here’s a few of my tutorials:

The power of openness and collaboration

I am learning. All the time. That’s what drives me towards sharing what I’ve learned so far: helping others, and learning more. I truly believe that putting your work out there, even if it’s not perfect (mine is not perfect either), will increase your learning rate. That’s why I highly encourage everyone who reads this article and tries to do something with it to share their progress, small victories, new examples, points of discussion, critics, et cetera. This will make everyone better.

You can do so by replying to this article. You can also check out my Twitter and approach me on there.

If you like this article, please show it some love on Medium by clapping and following me. Thanks for reading!