If you're a woman between 50 and 70 in the UK you'll be familiar with the breast cancer screening programme. Women within that age range are invited for a mammogram every three years. If the result looks suspicious they are invited back for further tests. The aim is to spot breast cancers as early as possible as that gives the best prognosis.
For the NHS the screening programme presents a huge challenge. Around 2.2 million mammograms are performed every year as part of the programme. That's over 6000 per day on average. Each mammogram will be looked at by two separate radiologists, and around 4% of women will be called back. That's a lot of work, especially at a time when the health service is already struggling.
In the UK women between 50 and 70 are invited for a mammogram every three years.
A solution to this problem is on the horizon, but it's one that might make you feel uneasy: some of the tasks radiologists currently do could be taken over by machines. This doesn't mean that women going for mammograms would have to talk to robots. But it does mean that computers would play a role in deciding which mammograms are deemed to be normal, and which women have to be called back.
The worry is, of course, that computers make mistakes, and when it comes to cancer there's no room for those. But the idea is not to leave the entire task of screening for breast cancer to computers. Rather, it's for algorithms to support humans who, after all, are fallible too.
Fiona Gilbert of the University of Cambridge is one of those radiologists that routinely reads mammograms and has long-standing experience with the screening programme. She believes that artificial intelligence has an important role to play. "We radiologists are hugely excited at the prospect of using artificial intelligence in our day-to-day work," she said at the recent Medical image understanding and analysis (MIUA) conference at the University of Cambridge.
Gilbert has worked with Google, Microsoft and other AI companies to explore the role of computers in breast cancer screening. One of Gilbert's graduate students, Sarah Hickman, co-organised the Women in MIUA workshop and is writing a PhD on AI in breast cancer screening. In her conference talk Gilbert looked at how AI can be employed and how well it's been doing in tests.
The idea of using computers to help radiologists isn't new. Already back in 1998 the US Food and Drug Administration allowed clinicians to use computer-aided detection (CAD). In 2002 the Centers for Medicare and Medicaid Services even offered clinicians an extra $15 for every mammogram read with the help of a CAD tool. By 2012, 83% of mammograms in the US were read using CAD.
Initial studies (including one co-authored by Gilbert) had indicated that the tools were useful, but by 2015 it became clear they weren't a magic wand. A major study published that year showed that radiologists using CAD actually did slightly worse than those not using it when it came to picking up cancers. Generally, the study said that CAD tools didn't improve diagnostic accuracy on any metric the study looked at.
A potential explanation for this is that radiologists became over-reliant on CAD tools, paying less attention to what their own eyes told them. "It's a really important thing to remember when you're introducing AI into any system [that also involves humans] the AI may have an adverse affect on [human] performance," Gilbert said in her talk. "You have to have a way of measuring what's going on."
See here for more about deep leaning!
Fast forward a few years and we are dealing with tools that are far more advanced. They harness a form of artificial intelligence called deep learning and have spawned collaborations between mathematicians, who are developing the algorithms, and clinicians who are hoping to use them. Indeed, Gilbert has collaborated with mathematicians from the Cambridge Mathematics of Information in Healthcare Hub (CMIH) who organised the MIUA conference along with the Newton Gateway to Mathematics and the National Heart and Lung Institute.
Luckily we don't have to gamble with women's lives to see if AI tools work. Rather than testing them in real time on women coming forward for screening, we can test them on mammograms from past runs of screening, belonging to women whose history regarding cancer is known. That gives us a way of comparing the performance an AI tool would have delivered with the results that human radiologists produced.
Ruling in and out
One thing AI can potentially do to support humans is to triage mammograms: separate the normals from the suspicious. This is already happening in Denmark to help deal with the backlog caused by the COVID-19 pandemic. Women in the Copenhagen region have their mammograms analysed by an AI tool which gives each mammogram a risk score: the higher the score, the more likely the AI deems the chance that a cancer is present. Those mammograms with a low risk score are then only read by one human radiologist, freeing the other one up to read mammograms that were deemed suspicious.
Researchers are exploring the possibility of going a step further, doing away with human input altogether for some mammograms. An example published this year uses data from the Danish breast screening programme collected in 2014 and 2015. The authors used a commercially available AI system called Transpara to score each mammogram. They then checked what would have happened if all women with a score less than 5 had been given the all clear without human input, and all women with a score higher than a certain threshold had been recalled for further tests without human input. The remaining mammograms — those deemed to have a medium risk — were imagined to have been read by two human radiologists, with the same results that happened in real life.
The study found that the AI-based screening could have reduced radiologists' workload by 62% and that the number of healthy women recalled unnecessarily (the number of false positives) could have been reduced by a quarter. It wasn't all good news though: the study also found that 1.5% of the cancers that had been identified by human radiologists would have been missed by the AI based screening. A system exactly like the one examined in the study might therefore not be acceptable, but that doesn't mean that all is lost. Other studies suggest there are approaches that push the rate of missed cancers down to 0. To find out more, watch Gilbert's talk at WiMIUA.
Raising the red flag
Another thing AI could do to help humans is to raise red flags. "This is where radiologists have read the cases, have said they're normal, and actually the tool is going 'wait, this case has a high score'," said Gilbert. Clinicians could then look at the suspicious case again, perhaps also using another imaging technique such as MRI scanning.
The point of the screening programme is to pick up cancers early.
If the AI does well, then this would mean we'd pick up cancer cases that would otherwise only be spotted later on: either between screening appointments because a woman develops symptoms (interval cancers) or at the next screening appointment (next round cancers). "These have much worse prognoses because we haven't found them early enough," Gilbert said in her talk. "The whole point of a screening programme is to pick up a cancer when it's as small as possible because then the chances of cells spreading to other parts of the body are very, very low."
There are a number of studies that investigate this idea. For example, a study published in 2021 used mammograms from over 400 women who were diagnosed with interval cancers in Sweden between 2013 and 2017. All these women developed cancers despite having been given the all clear at their last screening appointment. It was again the Transpara tool that was used to score those mammograms.
The study suggests that if, during screening, the AI tool had flagged up those mammograms whose risk scores were in the top 10%, then over 19% of interval cancers might have been detected at the time the women were screened. That corresponds to around 80 women who in the actual screening had been told their mammogram was fine, only to develop cancer later.
This doesn't mean that in a real screening programme 10% of women should be recalled. That would be far too much to handle and the 10% would contain many healthy women not about to develop cancer. But the study does suggest that AI can provide an effective second pair of eyes when it comes to spotting suspicious cases. Other research supports this, including work that is being done by Gilbert's PhD student Sarah Hickman.
There's yet another, more ambitious, application of AI in screening: see if an AI tool can pick out women who have a risk of developing cancer later on down the line. "It might be that the image is normal at the time of the screening, but the woman is at risk of developing cancer in two, three, four or five years' time," Gilbert said. "If you knew that, you would bring her back early, you wouldn't wait for the next screening round, or you might go for a different imaging test like MRI."
Again there's been research into how effective AI might be in spotting women at risk. A study published in 2019 used mammograms from women whose cancer history was known and then compared three different methods of predicting a woman's risk to a standard risk prediction model. The first method didn't involve any AI at all — it looked at risk factors such as whether a woman had a family history of cancer and where she was regarding the menopause. The second used AI, but only looked at mammograms. And the third was an AI approach using both mammograms and risk factors.
The result was that the AI tools, especially the hybrid one, did better than the non-AI models. Generally, the hope is that tools that can take into account all sorts of different kinds of information about a woman will be able to provide healthcare that is truly personalised.
Are all AI tools equal?
While various studies show promising results for AI in breast cancer screening, what's still missing is a coherent approach. As Gilbert pointed out in her talk, work by Hickman suggests there's lots of variation in the cancers different tools pick up. To compare different AI tools, and figure out which perform best, they all need to be tested on the same data set, a data set they haven't seen before.
Things are looking good for AI augmented screening, but more research needs to be done.
There also need to be clear benchmarks on what we want the AI tools to achieve. We want to be sure they reliably identify normal mammograms without missing cancers. We also want to keep down the number of women that are recalled, as unnecessary recalls are distressing and expensive. And while we want the AI tool to alert radiologists to suspicious cases, we don't want them to blurt out alerts too easily. Clear limits on what's acceptable for criteria like these need to be defined.
So while things are looking good for AI augmented screening, more work needs to be done to see to what extent it can be employed. This means that clinicians will continue to collaborate with the people who develop the algorithms. As Gilbert said to her mathematical audience at the MIUA conference, "We radiologists need you, mathematicians and computer scientists, to help us to really deliver healthcare."
About this article
This article is based on Fiona Gilbert's talk at the Medical image understanding and analysis (MIUA) conference which took place at the University of in Cambridge in July 2022. It was organised by the Cambridge Mathematics of Information in Healthcare Hub (CMIH), the Newton Gateway to Mathematics, and the National Heart and Lung Institute.
Marianne Freiberger is Editor of Plus.
The INI is an international research centre and our neighbour here on the University of Cambridge's maths campus. It attracts leading mathematical scientists from all over the world, and is open to all. Visit www.newton.ac.uk to find out more.