Use itertools.groupby() to Count Consecutive Occurrences in Python
For example, how many days did the longest heat wave last?
Context is everything. Why is this useful?
As an analyst, you may want to explore nuances such as consecutiveness to understand root causes or to remove outliers in data.
The topic that inspired this article is weather. In 2022, the Portland, Oregon temperatures were mild compared to many record-setting heat waves in other parts of the world or even Portland in the previous year.
However, even though 2022 might be mild compared to the record heat in 2021, the year seemed hot. Perhaps consecutive days of hot weather had impacted my perception? Let’s count the hot day streaks (also known as heat waves) with Python to analyze this assumption.
Weather data for the US
In a previous article, I wrote about the publicly available data source that I’m about to use from the National Centers for Environmental Information (NCEI). Please read that article for more details about how to access the data and visit NCEI to learn more about NCEI and the National Environmental Satellite, Data, and Information Service’s (NESDIS) mission.
In this article, the focus will be on TMAX, the maximum observed daily temperature at particular weather stations. The data for Portland, Oregon and my exploratory data analysis are available on GitHub.
The following data frame was created for this analysis and contains True / False boolean fields that indicate whether the temperature was at least 80 degrees, at least 90 degrees, or at least 100 degrees Fahrenheit.

Consecutive days in a row with temperature above 90 degrees Fahrenheit
To calculate consecutive occurrences is not as simple as you might think. There are multiple calculations required to arrive at a solution, and developing an algorithm that can work on years of data is not trivial.
However, there are packages that reduce the number of functions to write for this algorithm. The itertools.groupby() package for Python was the easiest method for me to implement for this analysis.
First, import the groupby() function from the itertools package.
from itertools import groupbyNext, define a function that takes a list of data points for a year as the parameter and then returns a list.










