Age Grading, Percentiles, and Z-Scores: Three Ways to Compare Race Results

How do you effectively and fairly compare race results between two runners from different age groups?
That’s the question I’ve been exploring in an ongoing series of articles.
The current system — age grading — is useful. But it has some flaws. After collecting and analyzing a truckload of data, I’ve offered two alternatives to age grading — percentiles and z-scores.
Today, I wanted to take a step back and compare these three different methods. In particular, one of my problems with age grading is that it seems to skew more favorably towards some groups than others. Do the alternatives have the same problems?
I’ve also set up a calculator that you can use to input a runner’s age, gender, and time — and get all three scores so that you can compare them yourself.
But first, I wanted to take a minute to recap where we’ve been. If this is the first article you’ve read on the topic, this is for you. If you’ve been following along from the beginning, feel free to skip ahead to the next section.
A Recap of the Previous Articles on Age Grading
This series started with a question — how do you effectively compare race results between different age groups and genders?
In the first article, I offered a look into some of the history behind age grading and a basic critique of the system.
The question is important, and age grading is much better than having nothing. However, I find it a bit problematic to try and compare everyone’s results to the hypothetical best result possible. Determining that standard is difficult, and the particular standard can be a bit arbitrary.
It’s also not all that helpful for the average runner to understand how their results fit in to the bigger picture — because no matter how good they get, they’ll always be a long way off that best possible time.
Age grading was developed based on stats and studies of the best runners — but I wanted to take a look at this problem from a different vantage point. How do we compare to all of the other runners?
Statistics offer several ways to make these comparisons, but first I needed a large and representative data set to work with. In the second article in the series, I laid out the sample that I’d put together.
In short, I narrowed things down to American marathons that took place in September, October, or November from 2010 to 2019.
I chose these three months to try to capture an entire ‘season’ of running so that I didn’t have to worry about individually weighing which races to include. Some are fast races, some are slow. Collectively, they’re pretty representative.
I also narrowed things down to races with 500 or more runners. This didn’t shrink the sample size much, but it did make the scraping process easier.
Once I collected all of the data, I explored what was in the dataset.
The dataset includes over 2 million individual race results. While the biggest group of runners are in the under-35 age group, each age group is pretty well represented — at least up until the 70s. There were a decent number of men 70–74, but there was a smaller group of men 75–79 and women 70–74. The group of women 75–79 is smaller yet, and there’s just not enough data to make any useful analyses for runners over 80.
With the data in hand, I was able to offer up the first alternative: percentiles.
Essentially, this is a method of looking at a distribution and figuring out where a particular result fits into that distribution. By looking at all results, I can say what percentage of runners in a given age group finished slower than a given time — and what percentage beat it.
Assuming the distribution of times between these age groups is fairly consistent, a runner in the top 5% of their age group should be more or less equivalent to another runner in the top 5% of their own age group.
In a follow-up article, I took a closer look at the individual distributions to ensure that they were similarly constructed. For the most part, they were. Although things got a little less clean among the smaller, older age groups — the general shape of the distributions was the same.
However, upon further analysis, it does seem that the current tables I’ve developed may favor some groups over others — especially at the 99.9th percentile. I think this can be worked out in a future version, but that’s a problem for another day.
Finally, in the most recent article, I took a look at whether z-scores would offer a better way to compare race results.
A z-score is a measure of how far above or below the mean a given data point is. In short, you calculate the mean and standard deviation for each distribution — and then every result can be assigned a standardized number to represent how fast or slow it is.
In a broad sense, it works. Results with z-scores of -2 or lower are on the outskirts of the distribution — and clearly impressive. But this method is unbalanced, and it definitely favors younger women over other age groups.
While I think percentiles can be tweaked and calibrated to offer better comparisons — I don’t think that’s possible with z-scores. Nonetheless, I’m going to keep them in the mix for now to help offer context.
Are These Systems Fair to Different Age Groups?
At this point, we have three different systems for comparing race results between age groups — the existing age grading system, tables with percentiles for each age group, and the means to calculate z-scores for a given result.
How fair are these three different approaches? Do they favor one group over another?
One way to look at this question is to compare the percentage of the overall sample that comes from each age group to the percentage of the top runners — based on each method — that represent each age group.
In a general sense, you’d expect those distributions to be fairly similar. Perhaps younger age groups will be overrepresented because there are more elite athletes. But otherwise — if the system is fair — there shouldn’t be any huge disparities.
The visual below shows the top 100 finishers, by each grading methodology, from 2019. It also shows the percentage of all runners who are in each age group.





