Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Commercial Artificial Intelligence Versus Radiologists: NPV and Recall Rate in Large Population-Based Digital Mammography and Tomosynthesis Screening Mammography Cohorts
6
Zitationen
11
Autoren
2025
Jahr
Abstract
<b>BACKGROUND</b>. By reliably classifying screening mammograms as negative, artificial intelligence (AI) could minimize radiologists' time spent reviewing high volumes of normal examinations and help prioritize examinations with high likelihood of malignancy. <b>OBJECTIVE</b>. The purpose of this study was to compare performance of AI, classifying examinations as positive at different thresholds, with that of radiologists, focusing on NPV and recall rates, in large population-based digital mammography (DM) and digital breast tomosynthesis (DBT) screening cohorts. <b>METHODS</b>. This retrospective single-institution study included women enrolled in the observational population-based Athena Breast Health Network. Stratified random sampling was used to identify cohorts of DM and DBT screening examinations performed from January 2010 through December 2019. Radiologists' interpretations were extracted from clinical reports. A commercial AI system classified examinations as low, intermediate, or elevated risk. Breast cancer diagnoses within 1 year after screening examinations were identified from a state cancer registry. AI and radiologist performance were compared. <b>RESULTS</b>. The DM cohort included 26,693 examinations in 20,409 women (mean age, 58.1 years). AI classified 58.2%, 27.7%, and 14.0% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 88.6%, 93.3%, 7.2%, and 99.9%; for AI defining positive results as elevated risk, 74.4%, 86.3%, 14.0%, and 99.8%; and for AI defining positive results as intermediate or elevated risk, 94.0%, 58.6%, 41.8%, and 99.9%. The DBT cohort included 4824 examinations in 4379 women (mean age, 61.3 years). AI classified 68.1%, 19.8%, and 12.1% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 83.8%, 93.7%, 6.9%, and 99.9%; for AI defining positive results as elevated risk, 78.4%, 88.4%, 12.1%, and 99.8%; and for AI defining positive results as intermediate or elevated risk, 89.2%, 68.5%, 31.9%, and 99.8%. <b>CONCLUSION</b>. In large DM and DBT cohorts, AI at either diagnostic threshold achieved high NPV but had higher recall rates than radiologists. Defining positive AI results to include intermediate-risk examinations, versus only elevated-risk examinations, detected additional cancers but yielded markedly increased recall rates. <b>CLINICAL IMPACT</b>. The findings support AI's potential to aid radiologists' workflow efficiency. However, strategies are needed to address frequent false-positive results, particularly in the intermediate-risk category.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.