Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Large Language Models in Interpreting MRI Reports and Recommending Treatment for Vestibular Schwannoma
1
Zitationen
4
Autoren
2025
Jahr
Abstract
<b>Background/Objectives</b>: The use of large language models (LLMs) by patients seeking information about their diagnosis and treatment is rapidly increasing. While their application in healthcare is still under scientific investigation, the demand for these models is expected to grow significantly in the coming years. This study evaluates the accuracy of three publicly available AI tools-GPT-4, Gemini, and Bing-in interpreting MRI reports and suggesting treatments for patients with vestibular schwannomas (VS). To evaluate and compare the diagnostic accuracy and treatment recommendations provided by GPT-4, Gemini, and Bing for patients with VS based on MRI reports, while addressing the growing use of these tools by patients seeking medical information. <b>Methods</b>: This retrospective study included 35 consecutive patients with VS treated at a university-based neurosurgery department. Anonymized MRI reports in German were translated to English, and AI tools were prompted with five standardized verbal prompts for diagnoses and treatment recommendations. Diagnostic accuracy, differential diagnoses, and treatment recommendations were assessed and compared. <b>Results</b>: Thirty-five patients (mean age, 57 years ± 13; 18 men) were included. GPT-4 achieved the highest diagnostic accuracy for VS at 97.14% (34/35), followed by Gemini at 88.57% (31/35), and Bing at 85.71% (30/35). GPT-4 provided the most accurate treatment recommendations (57.1%, 20/35), compared to Gemini (45.7%, 16/35) and Bing (31.4%, 11/35). GPT-4 correctly recommended surgery in 60% of cases (21/35), compared to 51.4% for Bing (18/35) and 45.7% for Gemini (16/35). The difference between GPT-4 and Bing was statistically significant (<i>p</i>-value: 0.02). <b>Conclusions</b>: GPT-4 outperformed Gemini and Bing in interpreting MRI reports and providing treatment recommendations for VS. Although the AI tools demonstrated good diagnostic accuracy, their treatment recommendations were less precise than those made by an interdisciplinary tumor board. This study highlights the growing role of AI tools in patient-driven healthcare inquiries.
Ähnliche Arbeiten
The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary
2016 · 15.760 Zit.
A multivariate analysis of 416 patients with glioblastoma multiforme: prognosis, extent of resection, and survival
2001 · 3.046 Zit.
International subarachnoid aneurysm trial (ISAT) of neurosurgical clipping versus endovascular coiling in 2143 patients with ruptured intracranial aneurysms: a randomised comparison of effects on survival, dependency, seizures, rebleeding, subgroups, and aneurysm occlusion
2005 · 2.834 Zit.
SPREADING DEPRESSION OF ACTIVITY IN THE CEREBRAL CORTEX
1944 · 2.663 Zit.
CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2012–2016
2019 · 2.593 Zit.