Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Large Language Models in Interpreting MRI Reports and Recommending Treatment for Vestibular Schwannoma

2025·1 Zitationen·DiagnosticsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background/Objectives: The use of large language models (LLMs) by patients seeking information about their diagnosis and treatment is rapidly increasing. While their application in healthcare is still under scientific investigation, the demand for these models is expected to grow significantly in the coming years. This study evaluates the accuracy of three publicly available AI tools-GPT-4, Gemini, and Bing-in interpreting MRI reports and suggesting treatments for patients with vestibular schwannomas (VS). To evaluate and compare the diagnostic accuracy and treatment recommendations provided by GPT-4, Gemini, and Bing for patients with VS based on MRI reports, while addressing the growing use of these tools by patients seeking medical information. Methods: This retrospective study included 35 consecutive patients with VS treated at a university-based neurosurgery department. Anonymized MRI reports in German were translated to English, and AI tools were prompted with five standardized verbal prompts for diagnoses and treatment recommendations. Diagnostic accuracy, differential diagnoses, and treatment recommendations were assessed and compared. Results: Thirty-five patients (mean age, 57 years ± 13; 18 men) were included. GPT-4 achieved the highest diagnostic accuracy for VS at 97.14% (34/35), followed by Gemini at 88.57% (31/35), and Bing at 85.71% (30/35). GPT-4 provided the most accurate treatment recommendations (57.1%, 20/35), compared to Gemini (45.7%, 16/35) and Bing (31.4%, 11/35). GPT-4 correctly recommended surgery in 60% of cases (21/35), compared to 51.4% for Bing (18/35) and 45.7% for Gemini (16/35). The difference between GPT-4 and Bing was statistically significant (p-value: 0.02). Conclusions: GPT-4 outperformed Gemini and Bing in interpreting MRI reports and providing treatment recommendations for VS. Although the AI tools demonstrated good diagnostic accuracy, their treatment recommendations were less precise than those made by an interdisciplinary tumor board. This study highlights the growing role of AI tools in patient-driven healthcare inquiries.

Autoren

Institutionen

University of Freiburg(DE)

Themen

Meningioma and schwannoma managementArtificial Intelligence in Healthcare and EducationThyroid and Parathyroid Surgery

Volltext beim Verlag öffnen

Evaluating Large Language Models in Interpreting MRI Reports and Recommending Treatment for Vestibular Schwannoma

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen