This week both the Daily Telegraph and the Daily Mail ran stories claiming that switching off street lights could significantly increase the number of road
deaths. The stories were based on a paper published in the Cochrane Library, which considered three studies into the connection between road accidents and street lighting. However, it seems that the headlines are a typical example of misinterpretation of statistics.
As David Spiegelhalter, Professor for the Public Understanding of Risk at the University of Cambridge, writes on his website Understanding Uncertainty, the studies suffer from three major flaws: poor data, publication bias, and what's known as regression to the mean. Spiegelhalter points out that the three studies underlying the
paper were poor and conducted decades ago, with one dating from as far back as 1948 — not a very good basis for drawing conclusions about today's traffic. The term publication bias refers to the fact that studies which show dramatic results are more likely to be published than those that don't. It's quite possible that there were other studies, which found no connection between street
lights and accidents, but that no-one bothered to publish such boring results. Regression to the mean is a commonly observed effect, which results from random fluctuations. If street lights were installed on a certain road, then this is most likely because that road recently experienced a spade of accidents. Such a freak period can be purely down to chance, in which case one would expect the
accident rate to return to normal after a while. Thus the improved accident rate after the installation of lights may be purely down to chance, rather than the improved lighting.
All this doesn't of course mean that street lights are useless. It simply means that the evidence is nowhere near as sound as the newspaper headlines claim. The Daily Mail, to its credit, did consult an expert, namely Spiegelhalter, but it's probably the headline, rather than his warning, that will stick in readers' minds.
When it comes to describing natural phenomena, mathematics is amazingly — even unreasonably — effective. In this article Mario Livio looks at an example of strings and knots, taking us from the mysteries of physical matter to the most esoteric outpost of pure mathematics, and back again.
Leading European scientists have said that mathematical modelling is key to future breakthroughs in the treatment of diseases including cancer, schizophrenia and Parkinson's disease. In a science policy briefing published by the European Science Foundation, the scientists set out a detailed strategy for the application of an area called systems biology to medical research. The aim is to
improve early diagnosis, develop new therapies and drugs, and to move to a more personalised style of medicine.
Last year a group of scientists came up with a surprising answer to a question that has occupied humanity since the dawn of time: how to influence the sex of your baby. In the paper You are what your mother eats, published in the journal Proceedings of the Royal Society B, the scientists claimed that it's all down to breakfast cereal. Eat more of it, and you increase your chances of giving birth to a boy. A highly unlikely claim, you might think, but there it was, the result of a sober statistical analysis of 740 women and their diet.
But now it seems that the team's sensational "evidence" was a result of pure chance and due to a basic methodological error. In a new paper, also published in the Proceedings of the Royal Society B, statisticians and medical experts show that the original authors most likely fell victim to a statistical pitfall that has been known to mathematicians since the nineteenth century. The problem arises when you perform too many tests on the same data set. To put it simply, the more questions you ask, the more likely it is that you get a strange answer to one of them.
As an example, imagine that your data set consists of the 740 women, information on their diet, and whether they give birth to a girl or a boy. You might then ask whether eating jellybeans influences the sex of the child. You count how many jellybean-eating mothers and how many non-jellybean-eating mothers give birth to boys and compute the percentage difference. If that difference appears large, it's tempting to conclude that jellybeans do influence the sex of the baby, but to be sure you ask yourself the following question: what is the probability that the large difference occurred purely by chance, and not because jellybeans influence gender? Using probabilistic methods, it's possible to calculate this probability, and if it is very low, you have good evidence that the result wasn't just pure chance and that jellybeans do indeed have an effect on gender.
But now imagine that you're not just testing the effect of jellybeans, but of a whole range of different foodstuffs on the same data set. For each individual food, a large discrepancy in boy-births between women who eat the food and women who don't might indicate that the food influences gender, as it is highly unlikely that such a freak event would occur purely by chance. However, the more opportunity there is for a freak event to occur, the higher the chance that it will indeed occur. In other words, the more foods you test, the higher the chance that one of them will show a large discrepancy by chance when in reality there is no connection between that food and gender. It's a bit like playing dice: the more dice you throw, the higher the chance that one of them comes up with a six.
According to the new paper, written by Stanley Young, Heejung Bang and Kutluk Oktay, the authors of the original study failed to take account of the effects of multiple testing — indeed they tested a total of 132 foods in two different time periods. Young, Bang and Oktay re-examined the data and found that with such a large number of tests, one would expect some to falsely indicate a dependence of gender on the given foodstuff.
"This paper comes across as well-intended, but it is hard to believe that women can increase the likelihood of having a baby-boy instead of a baby-girl by eating more bananas, cereal or salt," Young, Bang and Oktay say in the paper. "Nominal statistical significance, unadjusted for multiple testing, is often used to lend plausibility to a research finding; with an arguably implausible result, it is essential that multiple testing be taken into account with transparent methods for claims to have any level of credibility."
So I get this whole idea, and I think it's nifty and stuff, but I've been wondering: how does this play out with retrospective analysis of huge data sets?
Okay, let's pretend that I grab all the NIH data that I can, and before checking out the data, I decide that I want to see if there is a correlation between height and mortality from CHF, say. And what do you know! I discover a statistically significant correlation. I publish my paper-- then go on to look for further correlations 19 more individual times and find nothing.
Do you see the problem? By looking at the data 20 times individually, I was fooled once. And yet, because I looked at each question in turn, it wasn't appropriate to use multiple analysis. Hell, I didn't even know how many times I was going to go digging when I published my first paper.
And the NIH data set makes this even more confusing! Because it's not just a matter of how many hypotheses I'm evaluating-- what about all of the other people using the same data to evaluate their own hypotheses?
Good question! Some people have suggested that statisticians should retire once they've found a significant result!
David Spiegelhalter, Professor for the Public Understanding of Risk at Cambridge says: "Correcting for multiplicity is controversial. You essentially need to identify how much you have had to search for your 'significant' result. So if these really were independent researchers each looking at an entirely different outcome measure, then there is no real need to correct. But once somebody puts these researchers together and makes some statement about the 'most significant' result, then a correction is needed."
So basically, if lots of different researchers test the same data base for correlations in exactly the same way using the same non-Bayesan methods, or if one person does this repeatedly, then there should be a correction for multiple testing when making statements about significant results.
Here's something all mathematicians know instinctively: changing a parameter in a dynamical system, even if it's only by a small amount, can have all sorts of non-obvious consequences. Some conservationists, however, don't seem to have learnt that lesson yet: by removing 160 feral cats from Macquarie Island to protect burrowing birds, a team of conservationists caused the rabbit population to
boom from 4000 in the year 2000 to 130,000 in 2006. The rabbits have now demolished up to 40% of the island's vegetation, which may never recover. Cleaning up the mess may cost up to $16 million.
According to experts, a simple risk assessment exercise could have prevented the disaster. "We need a culture change," Hugh Possingham of the University of Queensland told New Scientist. "It's a generalisation, but people who do environmental work are often adverse to mathematics, and so avoid quantitative risk assessments."