Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Commercial Artificial Intelligence Versus Radiologists: NPV and Recall Rate in Large Population-Based Digital Mammography and Tomosynthesis Screening Mammography Cohorts

2025·6 Zitationen·American Journal of Roentgenology

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

BACKGROUND. By reliably classifying screening mammograms as negative, artificial intelligence (AI) could minimize radiologists' time spent reviewing high volumes of normal examinations and help prioritize examinations with high likelihood of malignancy. OBJECTIVE. The purpose of this study was to compare performance of AI, classifying examinations as positive at different thresholds, with that of radiologists, focusing on NPV and recall rates, in large population-based digital mammography (DM) and digital breast tomosynthesis (DBT) screening cohorts. METHODS. This retrospective single-institution study included women enrolled in the observational population-based Athena Breast Health Network. Stratified random sampling was used to identify cohorts of DM and DBT screening examinations performed from January 2010 through December 2019. Radiologists' interpretations were extracted from clinical reports. A commercial AI system classified examinations as low, intermediate, or elevated risk. Breast cancer diagnoses within 1 year after screening examinations were identified from a state cancer registry. AI and radiologist performance were compared. RESULTS. The DM cohort included 26,693 examinations in 20,409 women (mean age, 58.1 years). AI classified 58.2%, 27.7%, and 14.0% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 88.6%, 93.3%, 7.2%, and 99.9%; for AI defining positive results as elevated risk, 74.4%, 86.3%, 14.0%, and 99.8%; and for AI defining positive results as intermediate or elevated risk, 94.0%, 58.6%, 41.8%, and 99.9%. The DBT cohort included 4824 examinations in 4379 women (mean age, 61.3 years). AI classified 68.1%, 19.8%, and 12.1% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 83.8%, 93.7%, 6.9%, and 99.9%; for AI defining positive results as elevated risk, 78.4%, 88.4%, 12.1%, and 99.8%; and for AI defining positive results as intermediate or elevated risk, 89.2%, 68.5%, 31.9%, and 99.8%. CONCLUSION. In large DM and DBT cohorts, AI at either diagnostic threshold achieved high NPV but had higher recall rates than radiologists. Defining positive AI results to include intermediate-risk examinations, versus only elevated-risk examinations, detected additional cancers but yielded markedly increased recall rates. CLINICAL IMPACT. The findings support AI's potential to aid radiologists' workflow efficiency. However, strategies are needed to address frequent false-positive results, particularly in the intermediate-risk category.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in cancer detectionRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Commercial Artificial Intelligence Versus Radiologists: NPV and Recall Rate in Large Population-Based Digital Mammography and Tomosynthesis Screening Mammography Cohorts

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen