Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance evaluation of predictive AI models to support medical\n decisions: Overview and guidance
12
Zitationen
14
Autoren
2024
Jahr
Abstract
A myriad of measures to illustrate performance of predictive artificial\nintelligence (AI) models have been proposed in the literature. Selecting\nappropriate performance measures is essential for predictive AI models that are\ndeveloped to be used in medical practice, because poorly performing models may\nharm patients and lead to increased costs. We aim to assess the merits of\nclassic and contemporary performance measures when validating predictive AI\nmodels for use in medical practice. We focus on models with a binary outcome.\nWe discuss 32 performance measures covering five performance domains\n(discrimination, calibration, overall, classification, and clinical utility)\nalong with accompanying graphical assessments. The first four domains cover\nstatistical performance, the fifth domain covers decision-analytic performance.\nWe explain why two key characteristics are important when selecting which\nperformance measures to assess: (1) whether the measure's expected value is\noptimized when it is calculated using the correct probabilities (i.e., a\n"proper" measure), and (2) whether they reflect either purely statistical\nperformance or decision-analytic performance by properly considering\nmisclassification costs. Seventeen measures exhibit both characteristics,\nfourteen measures exhibited one characteristic, and one measure possessed\nneither characteristic (the F1 measure). All classification measures (such as\nclassification accuracy and F1) are improper for clinically relevant decision\nthresholds other than 0.5 or the prevalence. We recommend the following\nmeasures and plots as essential to report: AUROC, calibration plot, a clinical\nutility measure such as net benefit with decision curve analysis, and a plot\nwith probability distributions per outcome category.\n
Ähnliche Arbeiten
Meta-analysis in clinical trials
1986 · 38.724 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 37.530 Zit.
PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation
2018 · 37.004 Zit.
The Cochrane Collaboration's tool for assessing risk of bias in randomised trials
2011 · 33.435 Zit.
RoB 2: a revised tool for assessing risk of bias in randomised trials
2019 · 28.264 Zit.