Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study (Preprint)
0
Zitationen
11
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> In the emergency department (ED), rapid prognostic assessment of patients with intracerebral hemorrhage (ICH) is essential for guiding treatment, even when stroke specialists are unavailable. Recent advances in large language models have triggered the increased application of machine learning (ML) models in medical contexts. </sec> <sec> <title>OBJECTIVE</title> To evaluate the predictive performance of GPT-based models for poor functional outcomes after ICH using real-world multimodal data routinely available at ED presentation. </sec> <sec> <title>METHODS</title> The data of patients with ICH admitted to a tertiary hospital were analyzed. Using routinely collected clinical data and noncontrast computed tomography (CT) images at admission, GPT-4.1 and GPT-5—accessed via Azure OpenAI Service—were applied to predict poor functional outcomes, defined as a modified Rankin Scale score of 3–6 at discharge. A conventional ML model was developed by combining deep learning-extracted features from Digital Imaging and Communications in Medicine CT data with clinical variables using L1-regularized logistic regression. GPT models were evaluated using the same clinical dataset and JPEG-format CT images. Model performance was assessed through discrimination (area under the receiver operating characteristic curve [AUROC]), calibration, reproducibility (intraclass correlation coefficient [ICC]), and clinical utility (decision curve analysis [DCA]). </sec> <sec> <title>RESULTS</title> The ML model achieved an AUROC of 0.85 (95% confidence interval, 0.79–0.90). Zero-shot GPT-4.1 and GPT-5 demonstrated strong discrimination (AUROC 0.83 and 0.86, respectively) with high reproducibility (ICC 0.91 and 0.95, respectively). Incorporating ML-derived information into model-informed prompts increased the AUROC to 0.85 and 0.87, respectively, with reproducibility remaining high (ICC 0.97 and 0.96, respectively). Calibration plots indicated that GPT models tended to underestimate probabilities; however, this bias improved after model-informed prompting. DCA showed a higher net benefit when ML-derived information was incorporated. </sec> <sec> <title>CONCLUSIONS</title> Zero-shot GPT models, particularly GPT-5, achieved predictive performance comparable to or exceeding that of conventional ML models using routinely available clinical data and CT images. Incorporating ML-derived outputs into GPT prompts further improved clinical utility, suggesting potential value for real-time decision support in emergency care. </sec>
Ähnliche Arbeiten
Dabigatran versus Warfarin in Patients with Atrial Fibrillation
2009 · 11.162 Zit.
Rivaroxaban versus Warfarin in Nonvalvular Atrial Fibrillation
2011 · 9.361 Zit.
Apixaban versus Warfarin in Patients with Atrial Fibrillation
2011 · 8.868 Zit.
Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019
2021 · 7.399 Zit.
Thrombolysis with Alteplase 3 to 4.5 Hours after Acute Ischemic Stroke
2008 · 6.607 Zit.