Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance uncertainty in medical image analysis: a large-scale investigation of confidence intervals
0
Zitationen
13
Autoren
2026
Jahr
Abstract
Performance uncertainty quantification is essential for reliable validation and eventual clinicaltranslation of medical imaging artificial intelligence (AI). Confidence intervals (CIs) play a centralrole in this process by indicating how precise a reported performance estimate is. Yet, due tothe limited amount of work examining CI behavior in medical imaging, the community remainslargely unaware of how many diverse CI methods exist and how they behave in specific settings.The purpose of this study is to close this gap. To this end, we conducted a large-scale empiricalanalysis across a total of 24 segmentation and classification tasks, using 19 trained models pertask group, a broad spectrum of commonly used performance metrics, multiple aggregationstrategies, and several widely adopted CI methods. Reliability (coverage) and precision (width)of each CI method were estimated across all settings to characterize their dependence on studycharacteristics. Our analysis revealed five principal findings: 1) the sample size required forreliable CIs varies from a few dozens to several thousands of cases depending on study parameters;2) CI behavior is strongly affected by the choice of performance metric; 3) aggregation strategysubstantially influences the reliability of CIs, e.g. they require more observations for macrothan for micro; 4) the machine learning problem (segmentation versus classification) modulatesthese effects; 5) different CI methods are not equally reliable and precise depending on the usecase. These results form key components for the development of future guidelines on reportingperformance uncertainty in medical imaging AI.
Ähnliche Arbeiten
New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)
2008 · 28.886 Zit.
TNM Classification of Malignant Tumours
1987 · 16.123 Zit.
A survey on deep learning in medical image analysis
2017 · 13.563 Zit.
Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening
2011 · 10.762 Zit.
The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM
2010 · 9.107 Zit.