OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 23:43

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

TRIAGE: Trustworthy Reporting and Assessment for Clinical Gain and Effectiveness of AI Models

2026·0 Zitationen·DiagnosticsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2026

Jahr

Abstract

Machine learning (ML), including deep learning, kernel-based classifiers, and ensemble methods, is increasingly used to support clinical diagnosis in medical imaging, biosignal interpretation, and electronic health record (EHR)-based decision support. Despite rapid progress, many diagnostic AI studies still rely on limited retrospective evaluation and single summary measures (e.g., accuracy or AUC), creating a gap between reported model performance and evidence required for safe clinical adoption. This review proposes TRIAGE, a clinically grounded evaluation framework designed to organize diagnostic AI testing as an evidence pipeline aligned with real clinical use cases (screening, triage, second reading, and confirmatory testing). We summarize core discrimination metrics derived from the confusion matrix (sensitivity, specificity, predictive values, likelihood ratios, diagnostic odds ratio, and F-scores) and highlight the importance of prevalence and spectrum effects for interpreting predictive value and clinical workload. We further review evaluation strategies for multi-class and multi-label diagnostic tasks using appropriate aggregation methods (micro, macro, and weighted averaging) and set-based measures such as Hamming loss, exact match ratio, and Jaccard/IoU. Because diagnostic deployment is threshold-dependent, we integrate representation curves (ROC, precision–recall, lift, and cumulative gain) with calibration assessment and clinical utility analysis, including calibration slope, Brier score, and decision-curve analysis. We also address robustness and fairness evaluation, leakage-resistant validation designs (patient-grouped splits, stratified and temporal validation, and external validation), computational constraints relevant to deployment (latency, throughput, and energy use), and statistically sound model comparison with multiplicity control. A structured TRIAGE checklist table summarizing the evaluation parameters described in this review is provided in the main text to support reproducible and clinically interpretable reporting.

Ähnliche Arbeiten