Thursday, January 15, 2009
You aren't what your mother eats
Last year a group of scientists came up with a surprising answer to a question that has occupied humanity since the dawn of time: how to influence the sex of your baby. In the paper You are what your mother eats, published in the journal Proceedings of the Royal Society B, the scientists claimed that it's all down to breakfast cereal. Eat more of it, and you increase your chances of giving birth to a boy. A highly unlikely claim, you might think, but there it was, the result of a sober statistical analysis of 740 women and their diet.
But now it seems that the team's sensational "evidence" was a result of pure chance and due to a basic methodological error. In a new paper, also published in the Proceedings of the Royal Society B, statisticians and medical experts show that the original authors most likely fell victim to a statistical pitfall that has been known to mathematicians since the nineteenth century. The problem arises when you perform too many tests on the same data set. To put it simply, the more questions you ask, the more likely it is that you get a strange answer to one of them.
As an example, imagine that your data set consists of the 740 women, information on their diet, and whether they give birth to a girl or a boy. You might then ask whether eating jellybeans influences the sex of the child. You count how many jellybean-eating mothers and how many non-jellybean-eating mothers give birth to boys and compute the percentage difference. If that difference appears large, it's tempting to conclude that jellybeans do influence the sex of the baby, but to be sure you ask yourself the following question: what is the probability that the large difference occurred purely by chance, and not because jellybeans influence gender? Using probabilistic methods, it's possible to calculate this probability, and if it is very low, you have good evidence that the result wasn't just pure chance and that jellybeans do indeed have an effect on gender.
But now imagine that you're not just testing the effect of jellybeans, but of a whole range of different foodstuffs on the same data set. For each individual food, a large discrepancy in boy-births between women who eat the food and women who don't might indicate that the food influences gender, as it is highly unlikely that such a freak event would occur purely by chance. However, the more opportunity there is for a freak event to occur, the higher the chance that it will indeed occur. In other words, the more foods you test, the higher the chance that one of them will show a large discrepancy by chance when in reality there is no connection between that food and gender. It's a bit like playing dice: the more dice you throw, the higher the chance that one of them comes up with a six.
According to the new paper, written by Stanley Young, Heejung Bang and Kutluk Oktay, the authors of the original study failed to take account of the effects of multiple testing — indeed they tested a total of 132 foods in two different time periods. Young, Bang and Oktay re-examined the data and found that with such a large number of tests, one would expect some to falsely indicate a dependence of gender on the given foodstuff.
"This paper comes across as well-intended, but it is hard to believe that women can increase the likelihood of having a baby-boy instead of a baby-girl by eating more bananas, cereal or salt," Young, Bang and Oktay say in the paper. "Nominal statistical significance, unadjusted for multiple testing, is often used to lend plausibility to a research finding; with an arguably implausible result, it is essential that multiple testing be taken into account with transparent methods for claims to have any level of credibility."
Labels: Health and medicine
posted by Plus @ 5:05 PM
- At 6:31 AM, said...
So I get this whole idea, and I think it's nifty and stuff, but I've been wondering: how does this play out with retrospective analysis of huge data sets?
Okay, let's pretend that I grab all the NIH data that I can, and before checking out the data, I decide that I want to see if there is a correlation between height and mortality from CHF, say. And what do you know! I discover a statistically significant correlation. I publish my paper-- then go on to look for further correlations 19 more individual times and find nothing.
Do you see the problem? By looking at the data 20 times individually, I was fooled once. And yet, because I looked at each question in turn, it wasn't appropriate to use multiple analysis. Hell, I didn't even know how many times I was going to go digging when I published my first paper.
And the NIH data set makes this even more confusing! Because it's not just a matter of how many hypotheses I'm evaluating-- what about all of the other people using the same data to evaluate their own hypotheses?
- At 9:57 AM, said...
Good question! Some people have suggested that statisticians should retire once they've found a significant result!
David Spiegelhalter, Professor for the Public Understanding of Risk at Cambridge says: "Correcting for multiplicity is controversial. You essentially need to identify how much you have had to search for your 'significant' result. So if these really were independent researchers each looking at an entirely different outcome measure, then there is no real need to correct. But once somebody puts these researchers together and makes some statement about the 'most significant' result, then a correction is needed."
So basically, if lots of different researchers test the same data base for correlations in exactly the same way using the same non-Bayesan methods, or if one person does this repeatedly, then there should be a correction for multiple testing when making statements about significant results.