Follow Up on Using Percentiles to Compare Race Performances Across Age Groups

Taking a second look and refining the model

What’s the best way to compare race results for runners in different age groups?

That’s the problem I’ve been looking at in an ongoing series of articles.

As athletes age, they inevitably get slower. Finishing at a given time — say three and a half hours — is much more challenging for a man in his 60s than a man in his 20s. With a diverse field of athletes, how can we make an apples-to-apples comparison between different race efforts?

Analysis of Marathon Results and Age Grading

Age grading makes it possible to compare race results between different age groups. But is there a better alternative…

medium.com

The current system, age grading, relies on modeling the best possible performance for a given age and gender and then comparing an athlete’s individual performance against that standard.

In a previous article, I offered up a possible alternative: grading efforts by percentiles.

In short, if you take a large enough data set you can look at the distribution of race results within each age group and gender. Then, you can calculate a percentile for each effort.

In other words — what percentage of runners in this age group and gender have actually run this time?

This then forms the basis of the comparison. An effort that is better than 90% of results in one group should be roughly equivalent to an effort that is better than 90% of results in another group.

After I’ve had some time to reflect on this and digest some of the feedback I received, I wanted to take a second look — to address some possible critiques and to see if we can improve the model a little bit.

Is the Shape of the Distribution Similar Across Gender and Age Brackets?

One assumption underlying this approach to comparing race results is that there is a similar distribution of times across different populations.

If that distribution is similar, then a result at the same percentile across different age groups should be comparable.

But if the distributions are different, those results wouldn’t be comparable. If the results are tilted more heavily towards faster or slower runners, it would skew the percentile of a given result.

The visual below is a histogram that shows the percentage of runners, in each age group, that finished in each 5-minute increment. For example, the bar at 3:30:00 represents the percentage of runners that finished from 3:30:00 to 3:34:59.

You can click on the age groups in the legend to show or hide specific age groups, and you can use the drop-down box to filter between men and women.

If you look at the three youngest age groups — under 35, 35–39, and 40–44 — the distributions look almost identical. The older groups shift slightly to the right, but otherwise, the shape looks remarkably similar.

As you look at the older age groups of men, they still look similar — although there are some minor differences. The general shape — a short tail on the left, a peak left of center, and then a longer tail on the right — is the same. But the peaks are a little lower, and the left tail is a little shorter.

In the 60s, the peak more clearly shifts to the right. The graph is also more jagged. That jaggedness is likely a result of the smaller sample size, and if we could generate the same number of runners in these age groups as in the younger ones, the graphs would likely be similarly smooth.

If you look at the women, the distributions look similar. Each graph is shifted significantly to the right. For example, the peak for younger men is 4:00, while the peak for younger women is 4:30. But otherwise, the shape is pretty similar.

In the oldest age group, the shape of the women’s graph is much less distinct. This group may be too small to reliably create a high level of precision. The men 75–79 and the women 70–74 look like they’re on the cusp. But it’s worth keeping in mind that results for these older age groups should be taken with a grain of salt.

Ultimately, though, it looks to me like all of these distributions are pretty similar. There may be some minor differences, but there’s no indication that older age groups (at least through the 60s) have massively different distributions.

How Consistently Do Times Vary by Percentile?

Another way to think about whether these percentiles are good comparisons across age groups is to graph the time at each percentile across the different age groups.

If the order of the graphs is stable, and if the shape of each line is similar, then it’s an indication that it’s possible to make meaningful comparisons across age groups in this way.

The visual below has one line for each gender and age group. The x-axis goes from 0 (the slowest time) to 99.99 (the top 0.01% of finishes), and the y-axis is the finish time in seconds.

Again, you can click on each age group in the legend to filter what is or isn’t shown.

With all 20 lines on the visual, it’s a bit crowded. But we can make a few observations.

First, the shape is pretty consistent.

At the lowest percentiles — 0 to 10 — the time increases very fast. There’s a huge difference between the bottom 10% and the bottom 1%.

From 10% to 90%, the times decrease slowly but steadily, and there’s a pretty similar slope to each line.

Somewhere close to 100% — it’s hard to see on this graph — there’s a steep decline in finish times among the top 1% or 2% of runners.

The top line — women 75–79 — is rather jagged. The next two lines — women 70–74 and men 75–79 — are slightly jagged. Again, these are the smallest groups in the sample and the jaggedness in the line is a reflection of that. The remaining lines are all pretty smooth.

Finally, zoomed out like this, it looks like there’s a pretty consistent ordering to the age groups. The lines all go down in tandem, and they’re not crisscrossing.

Focusing on the Top 1%

But what if we zoom in on the top 1% of runners?

The visual below is the same as the visual above — but the x-axis has been restricted to 99 to 99.99%. The y-axis has also been restricted to zoom in. A few of the age groups are missing or only partially visible, but there’s enough here to get an idea of what’s going on.

Towards the left — from say 99 to 99.5 — there’s still a pretty clear definition between each group. They go down more or less in tandem.

However, the slopes of the lines are slightly different. The men’s open division declines more quickly over this span than any other group.

There are also some shifts. For example, the women’s open division declines more quickly than the older men. At the 99th percentile, the women’s open division is behind all of the men up to 50–54. But at around the 99.6th percentile, they move faster than the men 50–54. Around 99.9, they pass the men 45–49. And at 99.99, they’re almost faster than the men 40–44.

Something similar happens with women 35–39 and 40–44 — although the shift is less drastic. Although the younger men don’t change order, you can see that the younger age groups of men decline more rapidly than the older age groups.

I think this is an indication that the younger age groups — up to 40–44 — have more competition at the quicker end. This would be a logical result of elite and sub-elite runners. With more runners at this end of the spectrum, there’s a greater difference across that top 1%.

The other thing you should notice here is that the older age groups have more jagged lines. It’s more obvious when you zoom in.

Smoothing Out the End of the Graph?

So what can we do about this?

One thing I tried was to model an actual function for each line. That would reduce the jaggedness and ensure more consistency. However, it was difficult to find a function that would accurately model the drastic shift above 99.9%.

Instead, I took a look at the percentage difference between each age group and the men’s open group for each percentile. These differences were pretty consistent, up until the final few tenths of a percent.

I picked one point to use as a reference — which turned out to be 98% — and for the remainder (98 to 99.99) of the results I calculated the time for each age group based off its difference from the measured time in the men’s age group.

For example, the difference between the men 45–49 age group and the open men's age group was 9.36% at the 98th percentile. For every percentile for 98 to 99.99, I calculated the time that was 9.36% slower than the open men's age group for that percentile.

I took this modified chart of percentiles and graphed them below.

This eliminates the jaggedness of the lines and maintains the consistency between age groups.

It also takes the steeper slope of the men's open division and applies it to each of the other age brackets. This has the effect of making the standards a little tougher at the top end for age groups in the 50s, 60s, and 70s.

I think these modified charts are better than the original — but there’s one thing that I’m still a bit unsure about. I think it overvalues the efforts of young women (especially the open division), with the mark for the top few percentiles being a bit too soft.

What does the difference look like in practice?

The chart below lists the top ten runners at the 2019 Columbus Marathon, using the first draft of the percentile tables that I generated. The column titled ‘New Table’ shows the new score for each runner.

Gender   Age       Time       Old Table   New Table
---------------------------------------------------
   M     44      02:23:00       99.98       99.94
   F     39      02:35:50       99.97       99.98
   F     46      02:58:00       99.94       99.79
   F     28      02:34:29       99.94       99.97
   M     27      02:15:05       99.92       99.92
   M     30      02:15:05       99.92       99.92
   F     23      02:37:51       99.92       99.94
   M     35      02:24:54       99.91       99.87
   F     54      03:05:47       99.90       99.74
   M     55      02:52:04       99.90       99.65

For some runners, there’s little to no change.

The table didn’t change at all for the open men — so the two men who ran 2:15 still have the same percentile. The difference is minor for runners in their 30s and early 40s.

For the 46-year-old woman, the 54-year-old woman, and the 55-year-old man, the percentile dropped. And I think this is a move in the right direction. A 2:58 is an impressive time for a 46-year-old woman — but scoring it 99.94% seems overly generous.

The younger women, on the other hand, had their scores improve. The 28-year-old woman who ran 2:34:29 went from 99.94% to 99.97%. Compared to the 99.92% for the open men who ran 2:15, I don’t think that’s quite right.

I’ll take a look at this in the next iteration of things.

So What’s Next?

At this point, I’m happy enough with these charts. I think they represent a good first attempt at grading results based on percentile.

Is it perfect?

No. They may need some further adjustment at the very top end. I think it’s also worth conceding that this may not be the best method for measuring the very best performances — as it’s tough to really distinguish between the 99.8th percentile and the 99.9th.

But for someone in the 50 to 90th percentiles, I think this approach is very effective and offers a better understanding of just how good their performance is.

The current age grade system offers little differentiation for those above average — but not amazing — results. For example, a 40-year-old man who runs 4 hours gets an age grade score of 52.58. Another man in that age group who runs 3:30 gets an age grade score of 59.75. Based on the age grade score, that 30 minutes doesn’t seem to be that much of a difference.

But based on percentiles, that 4-hour marathoner is in the 59.3rd percentile — slightly above average. Meanwhile, the 3:30 marathoner is in the 83.29th percentile — far above average. A time of 3:15 would nudge the age grade up to 64.34, but that runner is finishing faster than 91.7% of his peers.

I also want to update these tables with 2023 data and compare them to the newer 2023 age factors for age grading. But I’ll save that for another day, after I’ve had a chance to gather that data.

For now, the next thing I want to do is look at a different approach — using z-scores to rank individual efforts. Similar to percentiles, this scores an effort based on where it fits into the overall distribution. But instead of simply ordering them and assigning a rank, z-scores are based off the standard deviation of the distribution and how far above or below the mean a given effort is.

After that, I plan on creating a calculator to allow you — the reader — to test these methods out yourselves with your own race results.

Once that’s done, I’ll focus on updating things with the newer 2023 data — and I’ll introduce these new versions into the calculator as well.

So if any of that interests you, make sure you subscribe for email updates. I’ll be sure to let you know when the next article is published.

And if you have any feedback or ideas that will help with this analysis — please leave a response. It always helps to have a second (or third or fourth) opinion!

I’m an avid runner and a data nerd. I turn 40 this week, so comparing results across age groups is of particular interest to me. Here’s how you can keep up with what I’m doing: