OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.04.2026, 05:14

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

S1791 Validating Calibration of an Artificial Intelligence Assessment of Endoscopic Severity in Ulcerative Colitis

2025·0 Zitationen·The American Journal of Gastroenterology
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2025

Jahr

Abstract

Introduction: Regulatory guidance recommends the endoscopy subscore as the index to assess the endoscopic component of the primary endpoint in ulcerative colitis (UC) trials. Inter-reader variability in assessments may impact the reliability of trial results. Currently, there is no metric in place to assess the certainty by which a reader is assigning an endoscopy subscore. Machine learning (ML) provides an opportunity to assess the endoscopy subscore and provide a measurement of its certainty in a standardized manner. Artificial Intelligence Assessment of Endoscopic Severity (AI-ES) accurately assesses the endoscopy subscore. The objective of this study is to evaluate the calibration of AI-ES - how well its predicted probabilities reflect true likelihoods - to assess the reliability of its measurement of certainty in endoscopy subscore assessments in UC trials. Methods: AI-ES is a deep learning algorithm that assesses the endoscopy subscore in UC endoscopic videos. AI-ES measures probability for the 4 ordinal endoscopy subscore classes. The endoscopy subscore with the highest probability is assigned as the final score by AI-ES. We assessed calibration on a holdout test set of 639 videos (∼25%) from the Phase 3 induction trial for mirikizumab in UC (NCT03518086). Videos had a 2 + 1 centrally read endoscopy subscore, randomly selected from week 0 and 12 with a distribution of endoscopic severity similar to the overall study population. Calibration plots were generated across endoscopy subscore classes with probabilities grouped into septiles (∼100 videos per group) for primary analysis and deciles for confirmation. Brier scores, ranging from 0 (perfect calibration) to 1 (worst calibration), were calculated, with values <0.25 considered informative. Results: AI-ES demonstrated strong calibration, with Brier scores below <0.25 for each endoscopy subscore (0: 0.037, 1: 0.082, 2: 0.162, 3: 0.112). The Brier score for evaluation of endoscopic improvement (0,1 vs 2,3) also showed excellent calibration (0.066). Findings were consistent when assessing probabilities by deciles. Conclusion: Whereas data on the certainty of human readers in endoscopy subscore assessments are elusive, AI-ES is calibrated across all endoscopy subscore classes, providing reliable data on score probabilities. This novel measurement of certainty by AI-ES added to the score assessment may enable novel AI-based multi-reader or consensus workflows in trials, potentially improving the reliability of UC endpoint assessments.

Ähnliche Arbeiten