It may be obfuscating the issue to suggest that p-values alone are not strongly suggestive of some genuine effect in an experiment like this. The reason is that unless the experiment is flawed, the only probability distribution of results that can occur without an interesting effect is that of the null hypothesis. Of course a single p-value like 0.009 is not by itself indicative of anything but a fluke: run 100 experiments and this sort of result is rather likely. This is why the particle physics community demands p-values of 0.0000003 before using the term "discovery". I don't believe they qualify this with a Bayesian criterion: the null hypothesis is simply the hypothesis that the hypothesis is not true.
It is perhaps inaccurate to say that no effect has been discovered years ago: for example, the book "Entangled Minds" by Dean Radin (2006) presents meta-analysis of a wide range of possible effects from experiments over several decades and finds qualitatively similar weak effects for most of them. The p-values are much lower in some cases, with much larger effective sample sizes. If I understand the data correctly, the effects cannot be adequately explained by the hypothesis that some of the experiments are flawed or fraudulent, because the distribution of the results of different experiments is much more like a weak effect with the variation in the results being the result of sampling. I would welcome the views of an expert like McConway on Radin's claims.
So it may be more accurate to say that (1) these effects have not been accepted as definitely genuine by mainstream psychology (or other sciences) and (2) there is little or no understanding of mechanisms that might explain the effects.
I feel it is this lack of any real scientific understanding of the nature of the purported phenomenon that blocks acceptance rather than the statistics themselves. Weaker statistics have been used (rightly in my opinion) to make major strategic decisions in other fields. Given a strong p-value (say 0.000001) in an experiment, two intelligent people can come to two different conclusions. The first might say "unless this experiment is faulty or fraudulent, there is very likely a real effect here". The second might say "this experiment is almost certainly faulty or fraudulent, since the conclusion is ridiculous". If ESP effects are weak in the way that Radin's analysis and this experiment (and many others) suggest, it would take a rather large experiment (or a genuine effect plus a bit of luck) to even reach these sorts of p-values, though more extreme ones can be found by combining data from many different experiments.
More information about formatting options