Abstract :
Summary Artificial intelligence (AI) could have the potential to accurately classify mammograms according to the presence or absence of radiological signs of breast cancer, replacing or supplementing human readers (radiologists). The UK National Screening Committee's assessments of the use of AI systems to examine screening mammograms continues to focus on maximising benefits and minimising harms to women screened, when deciding whether to recommend the implementation of AI into the Breast Screening Programme in the UK. Maintaining or improving programme specificity is important to minimise anxiety from false positive results. When considering cancer detection, AI test sensitivity alone is not sufficiently informative, and additional information on the spectrum of disease detected and interval cancers is crucial to better understand the benefits and harms of screening. Although large retrospective studies might provide useful evidence by directly comparing test accuracy and spectrum of disease detected between different AI systems and by population subgroup, most retrospective studies are biased due to differential verification (ie, the use of different reference standards to verify the target condition among study participants). Enriched, multiple-reader, multiple-case, test set laboratory studies are also biased due to the laboratory effect (ie, radiologists' performance in retrospective, laboratory, observer studies is substantially different to their performance in a clinical environment). Therefore, assessment of the effect of incorporating any AI system into the breast screening pathway in prospective studies is required as it will provide key evidence for the effect of the interaction of medical staff with AI, and the impact on women's outcomes. Author Affiliation: (a) Warwick Medical School, University of Warwick, Coventry, UK (b) UK National Screening Committee, Office for Health Improvement and Disparities, Department of Health and Social Care, London, UK (c) Centre for Medical Imaging, Division of Medicine, University College London, London, UK (d) Exeter Test Group, College of Medicine and Health, University of Exeter, Exeter, UK (e) St George's University Hospitals NHS Foundation Trust, London, UK (f) Oxford Breast Imaging Centre, Churchill Hospital, Oxford, UK (g) Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK (h) Department of Computing, Imperial College London, London, UK (i) Department of Chemical Engineering and Analytical Science, University of Manchester, Manchester, UK (j) Ninewells Hospital and Medical School, University of Dundee, Dundee, UK * Correspondence to: Prof Sian Taylor-Phillips, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK Byline: Prof Sian Taylor-Phillips, PhD [s.taylor-phillips@warwick.ac.uk] (a), Farah Seedat, PhD (b), Goda Kijauskaite, MSc (b), John Marshall, MA (b), Prof Steve Halligan, FMedSci (c), Prof Chris Hyde, MD (d), Rosalind Given-Wilson, FRCR (e), Louise Wilkinson, FRCR (f), Prof Alastair K Denniston, PhD (g), Ben Glocker, PhD (h), Peter Garrett, PhD (i), Prof Anne Mackie, PhD (b), Prof Robert J Steele, MD (j)