Exploring the Data to Understand Age Grading and Marathons

Let’s take a look at who’s in the sample and what the data looks like

How do you compare race results between different runners?

That’s the question I’m currently exploring in a series of articles.

Analysis of Marathon Results and Age Grading

Age grading makes it possible to compare race results between different age groups. But is there a better alternative…

medium.com

The current method relies on what are called age graded tables or age factors. This method, endorsed by World Masters Athletics, essentially combines empirical results and mathematical modeling to try and predict what an athlete would have run if they were still young.

For more context, read the first article in the series here.

But, I have some reservations about the system, and I’m looking at two alternative ways to try and create equivalent comparisons between age groups.

To do so, I collected a large sample of race results. You can read about how I identified the races to sample here.

Now that I’ve collected the results, I wanted to take some time to explore the data that’s in the sample in preparation for conducting a full analysis.

What’s In the Sample Set?

The sample that I’ll be working with includes individual finish times from all marathons with 500 or more runners, run in the United States, in the fall (Sept to Nov), from 2010 to 2019.

This includes 2,017,493 individual race finishes across 10 years.

It’s my hope (and again, you can read more about the design of the sample here) that this sample is representative of the population of marathon runners in the United States. One caveat here is that the sample may or may not be representative of athletes in other countries, and the results may not be directly transferrable to races outside of the United States.

Another caveat is that times may have increased since 2019 — and ultimately, I’ll revisit this topic with updated data from 2023.

Moving forward, we’re going to be using this sample to analyze results based on gender and age groups. So who is actually in this sample, and how large is each group?

The visual above shows the total number of finishes, by age group. You can use the dropdown menu in the top left to toggle between women and men.

I chose these particular age groups because they align with the age groups used by BAA for Boston qualifying purposes. I also chose to organize the results into five-year age groups, instead of analyzing them by individual age, to increase the sample size at higher ages.

Starting with the women, the largest group here is runners under 35. By a lot. That group is well over 2 times the size of the next group.

The next two groups also include 100,000+ results, and every group through 60–64 includes 10,000+ results. The 65–69 group includes just under 5,000 results.

Once you get to 70 and above, the size of the groups shrinks considerably. I’m not sure, at least for the women, that these will produce reliable results.

The 1,280 women 70–74 isn’t too bad, but with 244 women aged 75–79 and 72 women aged 80+, those groups are extremely small.

On the men’s side, the general pattern is the same. But the difference between the under 35 age group and the rest is significantly less.

Still, that group is almost twice the size of the next group. Every group through 50–54 has 100,000+ results and the other groups through 65–69 have 10,000+ results.

Here, the 70–74 group is fairly large — 5,749 results. I’d be more confident in the reliability of those results. The 75–79 group is beginning to get small (1,511) and the 80+ group is still far too small to think it’s representative (479).

All in all, we’ve got a good sample that should effectively represent both men and women through their 60s — and possibly into their 70s.

How Are Finish Times Distributed In This Sample?

A good first step to understanding this dataset is to look at the median and the interquartile range. This will give us a sense of how the finish times are distributed among different age groups.

The visual above displays 5 data points for each age group. Again, you can use the drop-down to toggle between women and men.

The middle point is the median — the point at which 50% of runners finished ahead and behind.

The point above and below that represent the interquartile range. If there were 100 runners in the group, this would be the 25th runner and the 75th runner.

The next two points take this range out a little further to the 90th and the 10th percentile. The bottom point is essentially where the cut-off would be for the top 10%. The top point represents the bottom 10%.

In both cases, you can see that times increase as runners get older — which is to be expected.

The increase is fairly small in the 30s and 40s, and it starts to increase more rapidly in the 50s and 60s.

For example, for women the median time goes from 4:37 (under 35) to 4:48 (45–49). That’s an increase of 11 minutes. But 65–69, the median has increased to 5:43 — almost another hour.

Generally speaking, the curve is the same whether you look at the 10th percentile or the median. But the times slow a little more rapidly at the front of the pack.

For the men, the difference from under 35 to 45–49 is 11 minutes at the 10th percentile (3:11 to 3:22) and only 6 minutes (4:10 to 4:16) at the median.

The one place where the curve is obviously out of whack is at 80+. For the men, it’s hardly slower than the 75–79 group. For the women, it’s actually faster than the younger runners.

This is quite likely a selection effect whereby people who run marathons after age 80 are probably experienced marathoners. On the other hand, even the small group of women 75–79 seem to fit the general pattern — although it’s possible they’re a little slower than would otherwise be expected.

How Do the Fastest Runners Compare to the Top 1%?

What if you look at a similar visual, but you focus exclusively on the best runners?

The visual below shows four data points per age group — the best time (in the sample), the top 0.1% (the first out of 1,000), the top 1% (best of 100), and the top 5%.

The general curve is the same — they still get slower as you move up in age.

But take a look at the y-axis, and notice that times themselves have gotten much faster.

For women, the top 10% mark is 3:40. The best overall mark is 2:14 (Brigid Kosgei’s world record mark at Chicago 2019), and the best of 1,000 mark is 2:40.

For men, the top 10% mark is 3:11. The best overall mark is 2:03, and the best of 1,000 mark is 2:16. Note that the focus on American races misses several successive world records set at Berlin — but the best time isn’t far off from Kipchoge’s 2018 effort (2:01:39).

Now, take a look at what happens when you scan to the right. The difference between the best time and the best of 1,000 mark is getting closer.

For men 35–39, there’s a 20-minute gap (2:05 to 2:25), and at 65–69 it’s only 15 minutes (2:54 to 3:09). At 70–74, it’s only 13 minutes (3:04 to 3:17).

For women 35–39, there’s a 24-minute gap (2:22 to 2:46). At 60–64, it’s down to 11 minutes (3:10 to 3:21), and at 65–69 it’s only 10 minutes (3:21 to 3:31).

In other words, as runners get older, the best runner is closer and closer to the rest of the other really good runners.

And that’s part of the problem with the current age grading tables.

They’re based, in part, on the best recorded time in a given age group. Younger age groups — i.e. 35-39 and 40-45 — are more likely to have professional athletes (or recently professional athletes) competing. As you move up, you’re more likely to get serious — but not professional — athletes notching a best time.

And at the end of the day, there’s a big difference between someone who commits (close to) 100% of their time to running marathons and someone who just commits a lot of time (10–15 hours per week).

How Does the Distribution of Times Relate to the Age Factors?

One final thing I want to look at today is how the decline in times — based on a given percentile — compares to the official age grading factors.

To create the visual below, I I first calculated the finish times at five points — the top 1% mark, the top 5%, the top 10%, the top 25%, and the median.

Then, I took the result at each of these marks for each age group, and divided it into the result for the 35 and under age group.

This essentially creates a ratio between a given age group — say 65–69 — and the open age group — under 35. It also happens to yield a number that’s comparable to the official age factors used by World Masters Athletics.

The visual above graphs those factors for each of the four percentiles — 1%, 5%, 10%, 25%, and median — as well as two official sets of age factors — the 2020 age factors and the 2023 age factors.

The 2020 age factors came from this repository and the 2023 age factors are available on the World Masters Athletics page. Since the age factors are per individual year, I averaged the individual age factors for each age group.

Take a look at the graph of the men’s age factors. You can click on individual lines in the legend to hide them.

Hide the top 10%, top 25%, and the median. Notice how from 35–39 to 45–59, the drop-off for the top 1% is so much lower than everyone else? Seems odd.

Later on, the drop-off at the top 1% lines up fairly well with the older 2020 age factors. Meanwhile, the top 5% lines up fairly well with the newer 2023 age factors.

And, of course, they all diverge at 80+. So ignore that for now.

On the women’s side, the differences are even more stark. The age factors — both 2020 and 2023 — drop off far more quickly after 55 than either the top 1% or the top 5%.

What does it mean? Good question.

At this point, it’s just things that make me go, “Hmm…” But it does suggest that there may be reasons to question the current age factors and age grading tables.

What Makes You Go Hmm?

Now that we’ve had a chance to explore this data a little bit, is there anything in there that makes you think? Any questions you think I ought to focus on in the next few articles?

If so, please leave a response. I don’t want to conduct this analysis in a vacuum, and your input is valued.

For now, my plan is to use this data to produce two alternative sets of age grading tables — one based on percentiles and one based on z-scores.

Then, I’ll compare them to some actual race results and see how well they seem to do at comparing results.

If you’re interested in this topic, be sure to subscribe for email updates. I’ll make sure you get the next article in your inbox.

The latest article just went live on Runners Life here.

Analysis of Marathon Results and Age Grading

Age grading makes it possible to compare race results between different age groups. But is there a better alternative…

medium.com

I’m an avid runner and a data nerd. I turn 40 next week, so comparing results across age groups is of particular interest to me. Here’s how you can keep up with what I’m doing: