OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 02:46

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Large Language Models in the Image-Based Diagnosis of Intracranial Tumors

2025·0 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

Background Artificial intelligence (AI) tools exist at the intersection of machine learning and natural language processing and are poised to rapidly transform healthcare. There has been a growing interest from clinicians in the ability of patient-accessible image analysis tools embedded into large language models (LLMs) to interpret raw clinical neuroimaging. Here, we compare the performance of GPT-4V and GPT-4o to that of neurosurgical attendings and trainees. Methodology A total of 20 brain MRI scans were included in this analysis, consisting of five gliomas, five meningiomas, five pituitary tumors, and five non-tumor control images. GPT-4V and GPT-4o were provided with identical prompts and MRI scans to determine the most likely diagnosis. Model performance in classifying each MRI scan into one of these four categories was compared to survey responses from neurosurgery attendings, fellows, senior residents, and junior residents. Results GPT-4V correctly diagnosed 40% of cases (n = 20), whereas GPT-4o achieved a 70% accuracy rate (n = 20). Neurosurgery attendings, fellows, and residents (n = 14) collectively identified the correct diagnoses in 84.6% of cases across the same 20 images (n = 280) based on a single cross-sectional MRI scan. Mean Cohen's kappa of surgeons compared to GPT-4V was 0.18, and compared to GPT-4o was 0.51. Conclusions While LLMs underperformed compared to surgeons in identifying central nervous system malignancies, GPT-4o demonstrated substantial improvement over GPT-4V, highlighting the rapid advancement of AI capabilities. Interrater reliability statistics showed further evidence that GPT-4o closely resembles human-level performance than GPT-4V. Further refinement of these models may bridge the performance gap and expand their utility in clinical neuroimaging. Extra caution should be given to patients in the use of such models at the individual patient level.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationGlioma Diagnosis and TreatmentMeningioma and schwannoma management
Volltext beim Verlag öffnen