Understanding uncertainty: how psychic was Paul?

This article is adapted from material on the Understanding Uncertainty website.

England's performance in the World Cup last summer was thankfully overshadowed by the attention given to Paul the octopus, who was reported as making an unbroken series of correct predictions of match winners. Here we present a mathematical analysis of Paul's performance in an attempt to answer the question that (briefly) gripped the world: was Paul psychic?

Paul picking Spain over the Netherlands on the 9th of July 2010.

First let's look at the evidence. Paul, resident of Oberhausen Sea World in Germany but originally from Weymouth, likes eating mussels. His keepers lowered a pair of boxes into his tank before each match, each containing a mussel and labelled with the flag of the country of one the competing teams. Paul then squoozed his way into one of the boxes and grabbed a mussel: the country whose box was entered was declared as Paul's prediction. He, or possibly a look-alike, had previously made predictions in Germany's six matches in the Euro 2008 competition, but picked Germany's box each time giving rise to suggestions that he was attracted to the German striped flag. Four out of six of these predictions were correct. But he excelled himself in the World Cup – he picked the winner of all Germany's seven matches, including their two defeats, and then got the World Cup final correct too. Paul's Wikipedia page contains more details than you need to know.

What's the evidence?

Let's start by taking the evidence – that Paul correctly predicted the eight results – at face value. We need to decide between two competing hypotheses: $H_{\rm psy}$ , that Paul is a psychic marvel, and $H_{\rm oct}$ , that he is just an ordinary rubbery cephalopod. We decide on the relative plausibility of two competing hypotheses through the likelihood ratio – this is the ratio of the probability of the evidence given that Paul is psychic to the probability of the evidence given that he is not. We write $p( \hbox{8 right} |H_{\rm psy})$ for the former and $p( \hbox{8 right} |H_{\rm oct})$ for the latter, where the vertical line "|" corresponds to "given". Now if we assume that being psychic means Paul has perfect predictive powers, then $p( \hbox{8 right} |H_{\rm psy}) =1$ , since he is certain to predict all the matches correctly. But if we assume that his choice is pure luck, then he has made eight independent correct predictions, each with probability 1/2, and so $p( \hbox{8 right} |H_{\rm oct}) = 1/2^8 = 1/256$ . So the "naïve" likelihood ratio in favour of being psychic is

$\frac{p(\hbox{8 right} |H_{\rm oct})}{p( \hbox{8 right} |H_{\rm psy})}=256,$

apparently rather powerful evidence. In fact, the evidence is even stronger than this, since in the first three matches there was a possibility of a draw, so that the chance of a correct prediction is closer to 3/8. But we shall ignore this nicety.

But is this likelihood ratio appropriate? It depends on two assumptions. The first is that we do not allow Paul to be just a little bit psychic – he's either right every time or guessing. The second assumption is that the chance of making a correct prediction is 1/2 for each match. This will only be the case if this is a fair and unbiased trial, so that a non-magical mollusc has an equal chance of selecting either team. But numerous web discussions have suggested this might not be the case: possible biases include a preference for certain flags, preference for the right-hand box, where he is in the tank when the boxes are lowered in, and so on. It was even suggested that he was filmed many times and just one film chosen. But of course, as the predictions were released before each match, being able to fiddle Paul's choice only helps if someone else feels they can predict the results: clearly this would be almost as remarkable as Paul making perfect predictions. If someone were influencing Paul's choices, then we might expect Paul to pick the favourite each time, but he did not always do this.

Just one of many?

Mani the parakeet with his owner M. Muniyappan. Image: Khalzuri.

Paul's main rival was Mani the Parakeet in Singapore, who was roundly beaten by Paul in the psychic showdown after picking Netherlands to win the final. In fact many animals around the world were making predictions – porcupines, guinea pigs and so on. Maybe we are only hearing about the successful one, and any creature picking North Korea to win the Cup is doomed to obscurity. This is a very important factor in interpreting evidence – what are we not hearing? When we see someone hit a hole-in-one on YouTube we know that this piece of film was chosen from countless unsuccessful attempts. But as in Sherlock Holmes and the dog that did not bark in the night, such missing evidence is often difficult to identify but can be vital. This is a well-known problem in interpreting claims about medical treatments – if we only hear about the successes, the evidence only tells us that it could possibly work, not how likely it is to work. That's why registers of clinical trials are being established so unsuccessful studies cannot just disappear.

Let's look at this problem in a little more detail. Paul rose to international prominence after four predictions, and we can assume that we would have never heard about him if he had got any wrong. So suppose there are $n$ animals making such predictions. Out of $n$ sets of predictions made by utterly un-psychic creatures, the chance that at least one gets them all right is

$P(\hbox{at least one of the predictors gets 4 matches right})$

$= 1-P(\hbox{none of the predictors gets 4 matches right})$

$=1 - P(\hbox{all of the predictors get at least one wrong})$

$= 1 - P(\hbox{a predictor gets at least one wrong})^ n$

$= 1 - (1 - P(\hbox{a predictor gets 4 right}))^ n$

$= 1 - \left(1- \frac{1}{2^4}\right)^ n$

$= 1 - \left(\frac{15}{16}\right)^ n.$

If there were, say, 20 animals making predictions at random, the chance that at least one gets all four predictions right is therefore

$1 - (15/16)^{20} = 1 - 0.28 = 0.72.$

So there is at least a 2 in 3 chance of someone like Paul popping up by chance alone. This means that the first four matches provide almost no evidence supporting Paul's powers, and only the final four predictions count, giving a likelihood ratio of 16.

Bring on Bayes

Let's ignore this for the moment and go back to taking the " $\hbox{8 right}$ " evidence at face value. So far we have looked at the probability of the evidence (the eight correct predictions) given that Paul is psychic or not psychic, but this does not get to the heart of the matter. The quantity that we are really interested in is the probability of Paul being psychic given the evidence, that's $p( H_{\rm psy} | \hbox{8 right} )$ .

It turns out that it is convenient to work in terms of odds rather than probabilities, where odds correspond to probability / (1 – probability). So a probability of, say, 0.8 corresponds to odds of 4, and odds of 1/3 corresponds to a probability of 0.25.

We're looking for the odds of Paul being psychic given the evidence. This is

$\frac{p(H_{\rm psy}| \hbox{8 right} )}{1- p(H_{\rm psy}| \hbox{8 right} ) } = \frac{p(H_{\rm psy}| \hbox{8 right} )}{p(H_{\rm oct}| \hbox{8 right} )}.$

Here $p(H_{\rm oct}| \hbox{8 right} ) = 1- p(H_{\rm psy}| \hbox{8 right} )$ is the probability of not being psychic given the evidence. To get this we use the odds form of Bayes theorem:

$\frac{ p( H_{\rm psy} | \hbox{8 right} )}{ p( H_{\rm oct} | \hbox{8 right} )} = \frac{ p( \hbox{8 right} |H_{\rm psy} )}{ p(\hbox{8 right} |H_{\rm oct} )} \times \frac{ p( H_{\rm psy})}{ p( H_{\rm oct})}.$

This expression relates the initial (also known as the prior) odds of being psychic before we see the evidence, that is

$\frac{p(H_{\rm psy})}{1 - p(H_{\rm psy})} =\frac{p(H_{\rm psy})}{p(H_{\rm oct})},$

to the final (also known as the posterior) odds, after seeing the evidence, namely

$\frac{ p( H_{\rm psy} | \hbox{8 right} )}{ p( H_{\rm oct} | \hbox{8 right} )}.$

It says that the two differ by a factor which is the likelihood ratio. Bayes theorem describes how we change our beliefs in the light of experience. It's due to the Reverend Thomas Bayes and was published in 1763, two years after his death. The theorem is a basic consequence of the rules of probability, and provides the basis for theories of learning, spam filters, formal legal reasoning, and an entire school of statistical inference. (You can find out more about Bayes theorem on Plus.)

Probably just lucky: Paul in his tank with a football boot. Image: Tilla.

So according to Bayes theorem, to find out the odds of Paul being psychic given the evidence, we first need to provide the initial probability on Paul being psychic. What, before you heard about his exploits, would have been your belief that an octopus could predict football results? Quite low, I believe. Let's give Paul the benefit of the doubt and say that the initial probability of his being psychic is $p( H_{\rm psy}) = 1/100.$ The initial odds is therefore 1/99, and the final odds, taking the evidence from all eight matches at face value, is 256/99 = 2.6 , which translates to a final probability $p( H_{\rm psy}| \hbox{8 right}) = 256/355 = 0.72$ , not that much more than 50:50.

Similar Bayesian processing of evidence can be used in legal reasoning, but has also been used to assess the probability that the Turin Shroud truly shows the face of Christ, that a recently discovered tomb was that of Christ, and even that God exists (answer: 67%), although the accuracy of these analyses is open to some dispute, to put it mildly.

But would we be happy with this analysis of Paul's supernatural skills? In fact, zero might be a more reasonable figure for the prior probability, if we simply consider it impossible that an octopus can predict football results. But if $p( H_{\rm psy} )$ =0, then the initial odds is 0, and the final odds is 0 whatever the size of the likelihood ratio. This is an important mathematical result: if you believe that a hypothesis is impossible, then no amount of evidence will change your mind, and you have to put the events down to just chance. Call me biased if you want, but that's certainly how I felt.

About the author

David Spiegelhalter

David Spiegelhalter is Winton Professor of the Public Understanding of Risk at the University of Cambridge. David and his team run the Understanding uncertainty website, which informs the public about issues involving risk and uncertainty.