How Do You Compare Race Results Between Age Groups?

The promise of and problems with the age grading system

At some point, age catches up with all of us. We might keep running, but we slow down.

When exactly this happens, and how sharp the decline is, will vary from person to person. But once you get into your 40s and 50s, you’re going to start running slower.

If you look at any major race, the winners are going to be in their 20s or 30s — maybe early 40s. They’re in the prime of their careers.

Older runners just can’t compete — at least not in large, competitive fields.

Over the years, race directors and athletic organizations have come up with various ways to level the playing field and keep things interesting. First, they created a separate Masters division for runners over 40. Then, they started to break the field down by age groups of 5 or 10 years.

Eventually, they came up with a system called age grading.

And that’s what I want to talk about today. Age grading.

What is it? What does it promise to achieve? What are its limitations?

And most importantly — is there an alternative?

The Age Grading System

With a simple age group system, it’s easy to compare performances within an age group. If a dozen people in the Women’s 60–64 age group participate in a race, they go head to head and you know who won.

But how do you compare the performances of athletes in different genders and age groups to determine who had the “best” performance?

That’s the promise of age grading.

The idea is simple. You start with a specific standard for a given age group. Typically, this would be the best time at that event for that age group.

Then, you compare the athlete’s results to that standard.

Let’s say, for example, that the standard for the marathon for a particular age group is 3 hours — or 10,800 seconds. An athlete finishes in 3 hours and 15 minutes — or 11,700 seconds. To find their age grade, you divide the standard by the result and get a percentage — 92.3%.

This score can be compared to the score of an athlete in a younger age group or a different gender. The other athlete might have a faster time, but the higher age graded score indicates a better performance.

You can also reverse the process to convert this score into a standardized time. To do this, you take the standard for the open division, divide it by the score, and get a new time.

So in the previous example, if the open standard was two hours and thirty minutes, you divide that (9,000 seconds) by the score (92.3%) and get a standardized time — 9,751 seconds (2:42:31).

Now you have both a score and a time that can be used to directly compare the performances of all of the athletes.

Have Runners at the Boston Marathon Gotten Older? Another Look at the Data

I’ve been working on a project to explore data on marathon finishing times across major US marathons over the last 40…

medium.com

What Are the Limitations Of This System?

One of the key questions — and limitations — of this system is how the individual standards are developed.

In its basic form, this age grading can be done with an existing world record in a given event for a given age. This has the advantage of being simple.

But it has the disadvantage of being based on a single performance. In the open division, with many athletes in competition, there’s a good chance that a true ‘limit’ emerges in the world record.

But at higher ages, with more limited pools of athletes, that’s not necessarily the case. If there’s an outlier and one particularly good athlete, they can sway the standard up. If there’s a small group, there may not be enough competitors to guarantee that they actually hit (or get close) to that limit. As a result, the record could be soft.

If you could simulate millions of performances by athletes in each age group, using the record would be a good way to gauge what is possible. But given the smaller populations of older athletes, this by itself isn’t necessarily a reliable gauge.

Another option is to try and calculate a percentage of decline as a person ages. If you can determine what percentage an athlete’s performance will decline at a given age, you can calculate the standard for each age. This would eliminate the reliance on actual records and it would reduce the influence of outliers.

The problem, though, is that this relies on the assumption that there is a predictable decline as someone ages and that this can be captured with a simple formula.

We can observe a clear negative relationship between age and speed — but there’s no reason to think that there’s an exact relationship that lets you predict how age impacts each individual or that the passing of a single year has a specific impact on times.

It’s more likely that different people experience age-related slowing differently, and the particular magnitude and pace of that slowing will vary with the individual.

As a result, it’s possible that a specific athlete — an outlier — could outperform expectations and score a time that’s ‘better’ than the standard.

But if the standard for an age group is supposed to be the best possible performance — the system breaks down a bit when people can possibly exceed that standard.

On the other hand, crafting the standards to avoid anyone breaking them may lead to them being overly strict. It’s just as problematic if the best athletes in an age group can’t get close to the standard — because they then can’t compete with athletes in other age groups that can compete with their standard.

The Tyranny of Predicting the Best Possible Time

Ultimately, the major drawback of this system is that it relies on predicting — in one way or another — the best possible time for a given age group.

The downside of predicting a best possible time is that it’s possible for an athlete to beat that time. And if your standard is supposed to represent the best time possible for a given athlete in a given age group — it’s problematic for them to run faster than that time.

It calls into question the whole scheme.

The group that worked on the original age grading tables left behind a decent amount of correspondence and historical documentation. You can find much of it published here.

One exchange focused on this specific issue. In an exchange between Al Sheahen and Chuck Phillips, Phillips advocates for a mathematical formula that predicts potential performance while Sheahen points out that this formula allows for multiple performances at a level exceeding 100% of the age grade.

The approach the group followed — which continues today — is to combine empirical observations with mathematical predictions. Math provides the general formula, but actual performances by athletes can shift the curve one way or another.

Currently, World Master Athletics maintains a list of age factors. The most recent list was released in 2023. This combines empirical observations of actual athletes as they age with mathematical predictions of what times should be possible at a given age.

The visual above shows the most recent (2023) age factors for the marathon by gender.

But, focusing on the marathon, there are still examples of performances with a greater than 100% age grade.

The men’s 45–49 record is 2:09:12, set by Mark Kiptoo at Zurich in April 2023. Based on the then world record (2:01:09, Kipchoge, Berlin 2022), Kiptoo’s performance was 100.14%. If you calculate based off Kelvin Kiptum’s new record, Kiptoo’s performance drops to 99.67%.

The 2023 Chicago Marathon By the Numbers

How does it stack up against previous years in the race’s history?

medium.com

The women’s 45–49 record is 2:21:34, set by Sinead Diver at Valencia in 2022. Based on the then women’s world record (2:14:04, Kosgei, Chicago 2019), that’s an age graded performance of 101.86%. Even taking into account Tigst Assefa’s new record (2:11:53, Berlin 2023), that’s still a 100.2% performance.

Although this system may not be perfect, it’s the one that’s been adopted by many races. You can often find your own age graded score or time in results, and some races give out awards based on these scores.

What if we compared performances in a different way — that didn’t rely on trying to predict what the best possible performance would be?

An Alternative — A Standardized Distribution

Instead, you can compare performances based on how likely they are and where they fit into the distribution of all possible performances.

One way to do that is with the concept of percentiles.

When I was a kid, I remember taking standardized tests at school. We’d get a score, and it would say you scored in the 80th or 90th percentile — or some other number.

Instead of a numerical score, the test compared your results to the results of another group of students — and determined that you performed better than X percent of them.

Similarly, if you were to gather a large, representative sample of results, you could determine where specific finish times fit in the big scheme of things. Instead of saying you finished in 90% of the best possible time — you could say that you finished ahead of 90% of similar runners.

An advantage to this system is you don’t need to predict a max. Once you get to a certain point — say 99.9% or 99.99% — it’s all the same. That does limit your ability to compare truly exceptional performances, but for the vast majority (say 99.9%) of performances, you can accurately rank how well a runner did.

A slightly different approach would be to standardize the distribution of scores.

This is similar to the way an IQ test is scored. You start with the ‘typical’ result, or the mean. Then, you determine how far above or below that result a particular result is.

Mathematically, this is accomplished by determining the standard deviation of a set of results and then computing the z-score. This number tells you, in a standardized way, how good or bad a result is.

There’s no limit here — and a number can continue to get lower and lower compared to the mean. So you don’t have to worry about a hypothetical best performance — and a ‘better than possible’ performance. As times get faster, the z-scores just keep getting (slightly) lower.

Where Do We Go From Here

I’m going to spend the next few weeks exploring this topic and documenting the process as I go.

The first step is to identify a sample to use to compute the percentiles and/or z-scores. In the next article, I’ll share how I determined that sample and what it consists of.

The next step is to actually compute the percentiles and z-scores for a few age groups, look at some examples, and see how things shake out. I’m interested to see what the distributions are, and how the change in those distributions compare against the age factors calculated by World Master Athletics.

Finally, once I’ve worked out the calculations to compute the percentiles and z-scores, I can apply them to a few races. Take the system for a test drive and see how it works. Does it accurately reflect the best masters performance?

Is it better than the current age grading system … or is it six of one, half dozen of the other?

Who knows. But I hope you’ll join me in the process of finding out.

The list below will contain the entire series as it’s published.

Analysis of Marathon Results and Age Grading

Age grading makes it possible to compare race results between different age groups. But is there a better alternative…

medium.com

I’m an avid runner and a data nerd. I’m also entering ‘Masters’ territory — I turn 40 next month. Follow me here on Medium to see how this series turns out, and to see other data informed stories related to running. You can also read about my running story on my blog, Running with Rock, or follow me on Strava.