Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study (Preprint)
0
Zitationen
11
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> In the emergency department (ED), rapid prognostic assessment of patients with intracerebral hemorrhage (ICH) is essential for guiding treatment, even when stroke specialists are unavailable. Recent advances in large language models have triggered the increased application of machine learning (ML) models in medical contexts. </sec> <sec> <title>OBJECTIVE</title> To evaluate the predictive performance of GPT-based models for poor functional outcomes after ICH using real-world multimodal data routinely available at ED presentation. </sec> <sec> <title>METHODS</title> The data of patients with ICH admitted to a tertiary hospital were analyzed. Using routinely collected clinical data and noncontrast computed tomography (CT) images at admission, GPT-4.1 and GPT-5—accessed via Azure OpenAI Service—were applied to predict poor functional outcomes, defined as a modified Rankin Scale score of 3–6 at discharge. A conventional ML model was developed by combining deep learning-extracted features from Digital Imaging and Communications in Medicine CT data with clinical variables using L1-regularized logistic regression. GPT models were evaluated using the same clinical dataset and JPEG-format CT images. Model performance was assessed through discrimination (area under the receiver operating characteristic curve [AUROC]), calibration, reproducibility (intraclass correlation coefficient [ICC]), and clinical utility (decision curve analysis [DCA]). </sec> <sec> <title>RESULTS</title> The ML model achieved an AUROC of 0.85 (95% confidence interval, 0.79–0.90). Zero-shot GPT-4.1 and GPT-5 demonstrated strong discrimination (AUROC 0.83 and 0.86, respectively) with high reproducibility (ICC 0.91 and 0.95, respectively). Incorporating ML-derived information into model-informed prompts increased the AUROC to 0.85 and 0.87, respectively, with reproducibility remaining high (ICC 0.97 and 0.96, respectively). Calibration plots indicated that GPT models tended to underestimate probabilities; however, this bias improved after model-informed prompting. DCA showed a higher net benefit when ML-derived information was incorporated. </sec> <sec> <title>CONCLUSIONS</title> Zero-shot GPT models, particularly GPT-5, achieved predictive performance comparable to or exceeding that of conventional ML models using routinely available clinical data and CT images. Incorporating ML-derived outputs into GPT prompts further improved clinical utility, suggesting potential value for real-time decision support in emergency care. </sec>
Ähnliche Arbeiten
Guidelines for the Early Management of Patients With Acute Ischemic Stroke
2013 · 7.634 Zit.
Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration
2013 · 5.267 Zit.
Frontotemporal lobar degeneration
1998 · 5.047 Zit.
Guidelines for the Management of Spontaneous Intracerebral Hemorrhage
2015 · 3.941 Zit.
Vascular Contributions to Cognitive Impairment and Dementia
2011 · 3.670 Zit.