Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging–Positive Cases: A Retrospective Study (Preprint)
0
Zitationen
14
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Localizing brainstem ischemic lesions based solely on neurological symptoms is challenging due to the complex anatomy and variable symptom presentation. Large language models (LLMs) take an emerging role in medical diagnostics by identifying patterns within clinical narratives. </sec> <sec> <title>OBJECTIVE</title> This study evaluates the diagnostic accuracy of LLMs compared to neurologists. </sec> <sec> <title>METHODS</title> We retrospectively analyzed 109 patients with diffusion-weighted imaging (DWI)-confirmed acute brainstem ischemia. Three neurologists and six LLMs (GPT-5, GPT-4, GPT-4.1, GPT-4o, o3, o3 pro) predicted lesion localization (midbrain, pons, medulla) and laterality (left/right) based on clinical symptoms alone. Accuracy, Cohen’s κ, regional performance, and correlations with symptom count were assessed, pairwise Chi2 tests with FDR corrections were performed to compare model performances. </sec> <sec> <title>RESULTS</title> GPT-4 and GPT-4o achieved the highest overall accuracy (56.0 %, 95 % CI 46.1–65.5), significantly outperforming all neurologists (χ² = 7.4–20.1, p < 0.01) and reasoning-based models. No significant differences were observed among GPT-4, GPT-4o, GPT-4.1, and GPT-5 (p > 0.05). In regional analysis, significant effects were restricted to pontine infarcts, where GPT-4 (74 %) and GPT-4o (69 %) exceeded all neurologists (χ² = 6.4–18.3, p < 0.01). For mesencephalic and medullary lesions, accuracies did not differ significantly (p > 0.05). GPT-o3 pro performed worst overall (10 %, p < 0.001). Cohen’s κ reached 0.29 for GPT-4o, and accuracy correlated with symptom count (r = 0.28, p < 0.01). </sec> <sec> <title>CONCLUSIONS</title> GPT-4, and GPT-4o outperformed experienced neurologists in this constrained diagnostic task. Accuracy remained modest, particularly for non-pontine lesions, and reasoning-augmented models did not improve additional benefit. These findings highlight both the potential and current limitations of LLMs in clinical reasoning, reinforcing the need for multimodal input and prospective validation. </sec>
Ähnliche Arbeiten
Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of Org 10172 in Acute Stroke Treatment.
1993 · 12.088 Zit.
Correspondence - Tranexamic acid for traumatic brain injury
2005 · 11.689 Zit.
Tissue Plasminogen Activator for Acute Ischemic Stroke
1995 · 11.617 Zit.
Aspirin plus Clopidogrel as Secondary Prevention after Stroke or Transient Ischemic Attack: A Systematic Review and Meta-Analysis
2014 · 11.546 Zit.
Beneficial Effect of Carotid Endarterectomy in Symptomatic Patients with High-Grade Carotid Stenosis
1991 · 8.406 Zit.