Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
TRIAGE: Trustworthy Reporting and Assessment for Clinical Gain and Effectiveness of AI Models
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Machine learning (ML), including deep learning, kernel-based classifiers, and ensemble methods, is increasingly used to support clinical diagnosis in medical imaging, biosignal interpretation, and electronic health record (EHR)-based decision support. Despite rapid progress, many diagnostic AI studies still rely on limited retrospective evaluation and single summary measures (e.g., accuracy or AUC), creating a gap between reported model performance and evidence required for safe clinical adoption. This review proposes TRIAGE, a clinically grounded evaluation framework designed to organize diagnostic AI testing as an evidence pipeline aligned with real clinical use cases (screening, triage, second reading, and confirmatory testing). We summarize core discrimination metrics derived from the confusion matrix (sensitivity, specificity, predictive values, likelihood ratios, diagnostic odds ratio, and F-scores) and highlight the importance of prevalence and spectrum effects for interpreting predictive value and clinical workload. We further review evaluation strategies for multi-class and multi-label diagnostic tasks using appropriate aggregation methods (micro, macro, and weighted averaging) and set-based measures such as Hamming loss, exact match ratio, and Jaccard/IoU. Because diagnostic deployment is threshold-dependent, we integrate representation curves (ROC, precision–recall, lift, and cumulative gain) with calibration assessment and clinical utility analysis, including calibration slope, Brier score, and decision-curve analysis. We also address robustness and fairness evaluation, leakage-resistant validation designs (patient-grouped splits, stratified and temporal validation, and external validation), computational constraints relevant to deployment (latency, throughput, and energy use), and statistically sound model comparison with multiplicity control. A structured TRIAGE checklist table summarizing the evaluation parameters described in this review is provided in the main text to support reproducible and clinically interpretable reporting.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.