Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Where Are We Now? Benchmarking Large Language Models (LLMs) in Computed Tomography (CT)-Based Detection of Intracranial Hemorrhage
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Introduction: Rapid computed tomography (CT) interpretation for intracranial hemorrhage is vital for timely care. Large language models (LLMs) have rapidly advanced in image analysis, with some claiming high accuracy in medical imaging interpretation. Evaluate whether LLMs, like Grok-2, ChatGPT-4o, and Gemini 1.5 Flash, can outperform a human medical student in detecting and classifying intracranial hemorrhages. Methods: Non-contrast, axial CT head scans were sourced from the Radiological Society of North America (RSNA) 2019 database, in which each slice is annotated by expert neuroradiologists. A random sample of 400 scans was selected, consisting of 200 normal cases and 200 hemorrhage cases, with 40 cases representing each major hemorrhage subtype. Grok-2, ChatGPT-4o, Gemini 1.5 Flash, and a blinded medical student were each given an image and a prompt to determine: (1) whether an intracranial hemorrhage was present, and (2) the specific type of hemorrhage. McNemar’s test was used to compare paired classification accuracies, and Cohen’s kappa was used to measure inter-rater agreement. Results: LLM accuracy in detecting hemorrhage ranged from 59.3% to 61.0%, with Grok-2 showing the highest specificity and Gemini 1.5 Flash the highest sensitivity. The medical student outperformed all LLMs in accuracy and specificity. Subarachnoid hemorrhages were the hardest to detect. Agreement was lowest between Grok-2 and the human reviewer (κ = 0.0637). Conclusion: Current general-purpose LLMs demonstrate moderate but inconsistent ability to detect and classify intracranial hemorrhages, underperforming compared to a human medical student. None of the LLMs matched human specificity or accuracy. Refinement of task-specific systems may be required to enhance clinical applicability in neuroimaging.
Ähnliche Arbeiten
Guidelines for the Early Management of Patients With Acute Ischemic Stroke
2013 · 7.641 Zit.
Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration
2013 · 5.301 Zit.
Frontotemporal lobar degeneration
1998 · 5.049 Zit.
Guidelines for the Management of Spontaneous Intracerebral Hemorrhage
2015 · 3.952 Zit.
Vascular Contributions to Cognitive Impairment and Dementia
2011 · 3.697 Zit.