Understanding uncertainty: Breast screening, a statistical controversy

David Spiegelhalter Share this page
December 2009

One in nine women will get breast cancer in her lifetime, and it seems sensible to try and find and treat those cancers as early as possible by taking regular X-rays of the breasts — a process known as screening with mammography. But heated arguments are going on in the USA about how often women should be screened, and in the UK about what women should be told when invited for screening. Both controversies rest on what one believes are the benefits and harms of screening. For example, it's been claimed that if 2,000 women are screened for ten years, only one will avoid dying from breast cancer, but ten will unnecessarily become cancer patients, since their tumour would never have caused a problem if it had remained undetected.


Disentangling these arguments involves subtle issues about statistics, mathematical modelling, scientific uncertainty and risk communication. This is very topical: newspapers and the medical journals repeatedly discuss the issue, and Sense about Science, a UK charity trying to improve public understanding of science, have recently released a guide to screening.

The idea of screening seems simple: if a cancer can be detected when it is small, then it can be treated appropriately and that should reduce the chance of the woman eventually dying of breast cancer. But women may be harmed by screening in a variety of ways: first, a woman may be made anxious and suffer unnecessary further investigation after an X-ray that suggests a cancer but which turns out to be a false alarm; second, she may genuinely have a form of cancer but one which would never have caused a problem if it had not been found and unnecessarily treated; third, she is exposed to radiation from the mammography that, paradoxically, may give her cancer (however this is a very small risk given the current low X-ray dose).

So first of all the health service has to decide how often to offer screening (which is the current concern in the US), and second each individual woman has to decide whether to accept the offer (the current concern in the UK). Each decision becomes one of weighing up the potential benefits and harms of screening, and this means doing research to help us put numbers on what those benefits and harms might be.

Benefits and harms: evidence from clinical trials

The best type of research study is a randomised controlled trial (RCT), in which a large group of women are randomly allocated to be offered screening or not, and then followed up for years to see what happens to them. Note, however, that some of the women offered screening will not accept it, while many of the women not offered screening will have it anyway. So we are comparing a group of women offered the screening to a group not offered it, rather than a group of women who have been screened to a group who haven't. Therefore, comparing the two groups allows us to estimate the benefits of a health service offering a screening programme, not the benefits for an individual woman going for screening.

The US researchers, for example, wanted to see whether screening benefited women under 50, and found eight relevant trials with the results shown in Table 1.

Without screening With screening
Trial Number of women (A) Deaths (B) Deaths/1000 women (C=1000B/A) Number of women (D) Deaths (E) Deaths/1000 women (F=1000E/D) Death rate difference (C-F)
1 106956 251 2.3 53884 105 1.9 0.4
2 25216 108 4.3 25214 105 4.2 0.1
3 13740 82 6.0 13740 64 4.7 1.3
4 14271 59 4.1 11724 34 2.9 1.2
5 8021 13 1.6 14303 34 2.4 -0.8
6 12279 66 5.4 13568 53 3.9 1.5
7 5031 16 3.2 9582 22 2.3 0.9
8 10459 30 2.9 10285 31 3.0 -0.1
Total Total Overall death rate Total Total Overall death rate
195973 625 3.19 152300 448 2.94
Table 1: Evidence from eight randomised trials of screening women aged 39-49, showing the number of women randomised to be offered screening or not, and the subsequent number of them who died of breast cancer.

Table 1 shows the number of women who died per 1,000, and the difference in the death rates in each trial — for example, in the first trial the death rate was 2.3 per 1,000 in the group without screening, and 1.9 per 1000 in the screened group, with a difference of 0.4.

There has been a lot of research into methods for summarising such a table. It would be tempting to just add up the columns to create the total number of women and deaths and then calculate the overall death rates and their difference. This suggests an overall saving of 3.19-2.94 = 0.25 of a life per 1,000 women offered screening.

However this is not an accepted procedure. Instead we should "respect the randomisation" — in other words calculate the effect of the screening in each study, and combine those measures of effect. For example, we might take the average of the death rate differences (the average of the values in the right-most column) to produce an estimate of 0.57 lives saved per 1000 women offered screening, representing a reduction in risk of 0.57/3.19 or 18%.

In fact, the authors' methods were somewhat more complex and they arrived at an estimate of a 15% risk reduction. This estimate of the true underlying risk is imprecise in spite of the large numbers of women recruited into the studies: the authors also provided a rather wide 95% confidence interval, which ranges from a 4% risk reduction to a 25% risk reduction. This means that they are 95% confident that the true risk reduction lies within this range. They estimated similar benefits of a 14% reduction for women aged 50-59, and a greater risk reduction of 32% for women aged 60-69.

Benefits and harms: evidence from mathematical models

However, this analysis was not the main evidence used to come up with the US recommendations for screening. Randomised trials can only give part of the picture, and detailed mathematical modelling is necessary using data from a variety of sources. The US Preventive Services Task Force (USPSTF) published its recommendations last month (November 2009), based on work by six different teams to estimate the effect of a range of different screening strategies. A "preferred" strategy was chosen, based on the average of the models, which only started screening at 50, and then screened every two years.

This final strategy was estimated to have the following benefits and harms when applied to 1,000 women:

Breast cancer deaths avoided 7.5
Extra years of life 121
Costs and harms
Mammographies 11109
False-positive diagnoses requiring further investigation 940
Unnecessary biopsies (tissue samples taken with needle) 66
Over-diagnoses (cancers detected and treated that would
not have caused any harm if left alone)
Table 2: Summary of benefit and harms for 1000 women under the screening strategy suggested by US Preventive Services Task Force, screened each two years between 50 and 74.

The Task Force found that screening more often and starting at a younger age saved more lives, but at a large increase in false positives (erroneous diagnoses of cancer when the woman was healthy) and unnecessary biopsies.

Woman in hospital

Screening can detect cancers that would have been harmless, leading to unnecessary treatment.

These recommendations are more intensive than those in the UK, which recommends screening only every three years, but were extremely controversial in the US, as previous guidelines had recommended screening every year from age 40. There have been dramatic newspaper headlines and claims that this reduction in screening was a denial of health care under the Obama administration, and the American Cancer Society has come out against the recommendations. It is noticeable, however, that the proposed strategy is supported by the major breast cancer charity — the National Breast Cancer Coalition. There is also no explicit attempt to cost the interventions — in the UK the cost of all the medical treatment would have been included in the analysis.

Potential harm: over-diagnosis

Most remarkable is the lack of any estimate of over-diagnosis — the diagnosis of cancers that would not have caused harm if left alone. In fact, the authors estimated over-diagnosis in each of the six models, but did not publish the results. They felt that the "absolute estimates were unreliable" due to limitations in knowledge about the types of tumour that are detected in these screenings.

Other authors do try and estimate over-diagnosis. For example, a July editorial in the British Medical Journal provides the following assessment:

Breast cancer deaths avoided 1
Costs and harms
False-positive diagnoses requiring further investigation 100-500
Over-diagnoses (cancers detected and treated that would
not have caused any harm if left alone)
Table 3: Summary of benefit and harms for 1000 50-year old women screened each two years for 10 years, as estimated by Welch (2009).

Another paper in the British Medical Journal recently argued that women are not given full information when considering screening, and a new leaflet has been written which contains the following estimates of what would happen to 1,000 average women going for screening.

Breast cancer deaths avoided 0.5
Costs and harms
False-positive diagnoses requiring further investigation 100
Over-diagnoses (cancers detected and treated that would
not have caused any harm if left alone)
Table 4: Summary of benefit and harms for 1000 women screened each two years for ten years contained in a proposed information leaflet for women deciding whether to go for breast cancer screening.

The latter table suggests that 2,000 women need to be screened to save one life over ten years, while ten of those 2,000 will have been unnecessarily treated as cancer patients. These figures, however, have been strongly disputed. Other researchers point to evidence that six lives will be saved in 1,000 women screened over 20 years (similar to Table 2), while other studies "suggest that over-diagnosis in mammography screening is a minor phenomenon".

There is also another problem with including the results from Table 4 in an information leaflet for women. The table claims to estimate what will happen to 1,000 different women who are offered screening, some of whom will accept and some not. But this seems inappropriate for an information leaflet: when a woman is asked whether she wants to attend screening, the relevant information is the benefit if she actually gets screened, not just offered it.

This is a real statistical issue — numerical evidence is being used to decide important policies that can affect the lives of all women, yet the evidence from different studies does not always agree and there is substantial scientific uncertainty. This is partly because trials (whose results also feed into the mathematical models) are done on different populations and in different ways, and also because the number of women benefiting from mass screening is small, so the benefit is hard to measure. Larger clinical trials would help, but it would now be very difficult, and probably unethical, to have a control group that was not offered screening. So it looks like careful statistical analysis and mathematical modelling is likely to be vital.

Communicating the evidence

Of course, the final decision on whether to be screened depends on an individual woman's feelings about the trade-off between the possibility of benefit and the risks of harm. People find these decisions difficult, and tend to be influenced by the opinions and behaviour of people around them whom they trust. But women should still have an idea of the magnitudes of the numbers involved in order to compare options.

It is often, however, difficult to make comparisons between the options. Not just because outcomes of studies don't always agree, but also because different ways of expressing benefits and harms are used. Even the recommendation by the USPSTF presents the evidence in an unclear way, focusing on the percentage reduction in risk, rather than the actual chance of a women benefiting from screening — for example, a 50% reduction in risk might seem like a huge improvement, but if the risk was very small to start with, then the chance for an individual woman to benefit from screening may remain small. Fortunately, the presentations of results are increasingly expressed in a common way — what is expected to happen to 1,000 typical women — a unified approach which is improving communication.

It may seem extraordinary that such disagreement and controversy exists in such an important and high-profile issue. As mentioned before, this is partly because the actual numbers of women who benefit from a mass screening programme are fairly small, and so measuring benefit is difficult. But the controversy about information leaflets may also reflect a caution in being too up-front to patients about both the possibly small size of the benefit, and the fact that there is so much uncertainty.

The current buzz-phrase is shared decision-making, which means that patients should be making fully-informed choices in cooperation with their doctor. This inevitably needs communication of the magnitudes of possible benefits and harms, and so emphasises the role of robust and clear statistical reasoning. These are difficult and delicate issues, but where numerical and mathematical insights can be very valuable.

About the author

David Spiegelhalter is Winton Professor of the Public Understanding of Risk at the University of Cambridge.

David and his team run the Understanding uncertainty website, which informs the public about issues involving risk and uncertainty.