The dust has yet to settle over Monday’s events. I read all kinds of reviews and analysis, furthermore the BAA, the race’s governing body, are petitioning the IAAF to have Mutai’s time recognized as a world record: 2 hours 3 minutes 2 seconds — I still can’t believe it.

Discussions have been focused on the divine tailwind and the net-downhill which created the perfect storm for an amazing race.
The best analysis I’ve read so far is from The Science of Sport, where they examine the effect of a strong tailwind from both a physics and a historical perspective. Long story short, the strong tailwind helped diminishing the effective power output required to maintain a certain speed; the feeling of no-wind many runners have described is perfectly in line with the analysis, in fact on a sailboat wind is very often little experienced, but it is certainly the wind to be “blamed” for the boat to sail.

As a researcher by training and trade, I prefer a more statistical approach and I tried to determine how fair is it not to consider Mutai’s time as a world record. (That’s aside of IAAF regulations and certifications.)
The question is to understand how exceptional the time of 2:03:02 is in general and in Boston in particular, and whether it should have been rejected, had the race been a scientific experiment. Textbooks on error analysis, e.g., Taylor [1], usually dedicate one chapter on the subtle art of rejecting data and I will be following those criteria.

Haile Gebrselassie at the 2008 Berlin Marathon: WR 2:03:59 - photo credit: Tobias Schwarz, Reuters

Rejection may sound a bit harsh; what I mean by that is to understand the “exceptionality” of Monday’s record in context, as numbers are meaningless out of it.

Rejecting data is a tricky, sticky business, but there might be occasions when it is better to throw away some weird data points, although it is done not by gut’s feeling but through a scientific and rigorous criterion.
From analyzing data, researchers see a pattern, usually the data cluster around a single value, the average. The bulk of all measurements accumulates around said average following a certain distribution: most of them, approximately 70% of all points, will be within a certain distance, called standard deviation, from the average. A point that falls further away from the average will be less likely to appear and, by using methods of statistical analysis, we can quantify how rare this event is: a point farther than 3 standard deviations from the average has less than one in a hundred chances to appear — this is a good, scientific way to evaluate how atypical an outlier is.

Talking about anomalous outlier becomes murky when the outlier is a record: all records are outliers, since a record has to be far away from the average to be a record. Still, Mutai’s time feels largely atypical; for once, it is one minute faster than Gebrselassie’s record in Berlin, moreover on a course which is known to be much slower than Berlin (or London, or Chicago). Second, it wasn’t just Mutai, also Mosop, who finished 4 seconds behind Mutai, technically broke Gebrselassie’s record by a minute. For context, when Gebrselassie set the world record, the second was a minute and a half behind him.
This is when all red flags are starting to appear. (Once again, it is not to dismiss either Mutai’s or Mosop’s performance: both are outstanding.)

Not only do our guts feel that something is off, but also from a statistical viewpoint this week’s world-record-that-wasn’t is more an outlier than all previous records.

In the following statistical analysis, I will be using the winner’s times from the past five Marathon Majors, i.e., Berlin, Boston, Chicago, London, and New York City, from 1990 to 2011 (obviously 2011 only for Boston and London). Using those five is a bit arbitrary: for instance, I could have used Rotterdam as well, but I’m pretty confident the result wouldn’t have changed much.

Let me start with Chicago and NYC first, then Berlin and London, and I’ll finish up with Boston.

[analysis] What follows are the plots of the winner’s times in Chicago and NYC; the shaded area represents one standard deviation from the average of the last 20 winner’s times for Chicago and New York respectively and, as one can see, almost all winning times fall well within this range.

Chicago Marathon: plot of winner's times in the last 20 years. The arrow shows the current course record.

NYC Marathon: plot of winner's times in the last 20 years. The arrow shows the current course record.

To put some numbers to those words, the average winning time in Chicago is 2h 08m 43s with a standard deviation of 3m 7s; in NYC, the average is 2h 09m 33s and standard deviation is 1m 10s. Were these data following a standard distribution, the bulk, the majority (~70%) of all points falls between 2h 05m 36s and 2h 10m 50s for Chicago and between 2h 08m 23s and 2h 10m 43s in NYC, and this is precisely what we see in the plots.
The red arrow shows the course record for both races. In the case of Chicago, the record is well within one standard deviation — more precisely 0.97 standard deviations away — while in New York it’s a bit more off: 1.57 standard deviations lower than the average. If the times were randomly cast, Chicago’s course record would have a 15% probability to appear, while New York slightly lower to 5% (it is half of what usually stated because a record is only so when it’s lower than times previously recorded). The course records are “unlikely” but not extremely so; a physicist would not shout discovery from data in that range.

Here are the two European marathons of the Majors: Berlin (average 2h 07m 13s, standard deviation 1m 53s) and London (average 2h 07m 45s, standard deviation 1m 42s) respectively.

Berlin Marathon: plot of winner's times since 1990. The arrow shows the current course record, the inset the trend since 1980.

London Marathon: plot of winner's times since 1990. The arrow shows the current course record, the inset the trend since 1980.

Once again we note how most of the points fall within the shaded region, that is, within one standard deviation from the respective averages. The course records (red arrows) are more “exceptional” than those of Chicago and New York, in fact the course (world) record in Berlin is 1.72 standard deviations away and London is 1.81: both their “unlikability” falls to roughly 3%, compared with the 15% of Chicago and 5% of New York.
This is not entirely surprising, after all London and Berlin are one of the fastest courses in the world and the course record in Berlin is the world record set by Gebrselassie in 2008. Furthermore they are not entirely outliers: as seen from the insets showing data from 1981 to 2011, there is an evident trend towards faster and faster times — that is, Berlin and London have attracted fast runners in their hopes to set new records giving rise to stronger competition and faster times. (It is also important to mention that both Berlin and London are paced.)
Therefore those results are “unlikely” but expected.

Finally, Boston. (Average 2h 09m 02s, standard deviation 2m 14s)

Boston Marathon: plot of winner's times in the last 20 years. The arrow shows the current course record.

The outlier, Mutai’s course record, is almost invisible in the grand scheme of things: it is 2.70 standard deviations away or, equivalently, it has an “unlikability” of 0.1% — 1 in a thousand!
In comparison, if we had made the same analysis last week — that is, without Mutai’s result — Cheruiyot’s record, which left many completely flabbergasted, is a more reasonable 1.90 standard deviations away, completely on-par with the results in Berlin (1.72) and London (1.81). Cheruiyot’s time in Boston is jaw dropping, but it doesn’t feel out-of-the-world, which is instead felt when looking at the time achieved by Mutai.

[conclusions] it is to me obvious that the wind had an important role in this year Boston Marathon; not only the physics suggests that the strong tailwind may be responsible to 2~3% of Monday’s performance, but also the statistics points out that this world’s best is truly an outlier. Breaking a record always moves away from the average, but historically no records moved too far, too quickly; it is probably mostly psychological, “why should I try to hit a 2:03 when the world is 2:15?” Statistically speaking, Boston’s record was like Haile Gebrselassie running 2:03:59 in the 60’s when the record is 2:15: it is “discovery,” it is ground-breaking, but and most importantly it feels like too much, too soon.
Now the real question is about what is to come. Psychologically, the existence of not one, but two low 2:03’s could lead to less conservative races with runners going out with a 61~61.5 minute half, and probably Sammy Wanjiru, (the other) Mutai and many other top-tier runners will join the ranks of Mutai’s — this is how records are made.

One thing is sure: it’s going to be an exciting Fall.

[update] I performed the same analysis for the women’s field, despite it not being contentious since Kilel’s time was well below Paula Radcliffe’s world record; it’s interesting to notice that even Radcliffe’s record is still well within 2 standard deviations from the average time in London for the past twenty years. [Boston 2011, the Women’s Field]


Marathon Average St. Dev. Distance
Chicago 2h 08m 43s 3m 07s 0.97’s
New York City 2h 09m 33s 1m 10s 1.57’s
Berlin 2h 07m 13s 1m 53s 1.72’s
London 2h 07m 45s 1m 42s 1.81’s
Boston 2h 09m 02s 2m 14s 2.70’s

[1] John R. Taylor, An Introduction to Error Analysis, University Science Books, 2nd edition 1996 [back up]