Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations
19
Zitationen
11
Autoren
2025
Jahr
Abstract
Robust assessment of artificial intelligence (AI) models in medical imaging is paramount for reliable clinical integration. This international collaborative review paper provides an overview of key evaluation metrics across diverse tasks, including classification, regression, survival analysis, detection, and segmentation, as well as specialized metrics for calibration, foundation models, large language models, and synthetic images. Challenges of comparing models statistically and translating metric scores to clinical practice are also discussed. For each section, the paper outlines fundamental metrics, identifies common pitfalls and misapplications, and offers recommendations for more robust evaluations. Key recommendations often involve utilizing multiple, complementary metrics tailored to the specific task and dataset properties, transparent reporting of methodology, and critically, considering the clinical utility and real-world implications of model performance. Ultimately, effective evaluation requires a comprehensive, context-aware approach that goes beyond statistical metrics to ensure model trust and clinical relevance. The authors hope this review will serve as a practical reference for researchers aiming to implement robust and clinically meaningful AI evaluations in medical imaging. • This review outlines the key metrics for evaluating medical imaging AI. • Common pitfalls and misapplications are critically examined, with corresponding recommendations provided for each. • Appropriate metric selection depends on the specific AI task. • Foundation and generative models require broader evaluation methods, beyond traditional evaluation metrics. • A multi-metric, context-aware evaluation is essential for reliability.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.
Autoren
Institutionen
- İstanbul Başakşehir Çam ve Sakura Şehir Hastanesi
- Istanbul Metropolitan Municipality(TR)
- Foundation for Research and Technology Hellas(GR)
- University of Crete(GR)
- Karolinska Institutet(SE)
- Federico II University Hospital(IT)
- University of Naples Federico II(IT)
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin(DE)
- Charité - Universitätsmedizin Berlin(DE)
- Essen University Hospital(DE)
- University Hospital of Zurich(CH)
- University of Zurich(CH)
- Klinikum rechts der Isar(DE)
- Deutsches Herzzentrum München(DE)
- University of Campania "Luigi Vanvitelli"(IT)
- Massachusetts General Hospital(US)
- Universitat de Barcelona(ES)
- University of Salerno(IT)