OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 01.05.2026, 20:47

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Beyond human ‘eyes’ in neurosurgical exams: success of artificial intelligence (chatgpt-4o, grok, and gemini) in the image-based questions of turkish neurosurgical society proficiency board exams

2025·0 Zitationen·Turkish Neurosurgery
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

AIM: To evaluate the impact of generative artificial intelligence and large language models (LLMs) on medical training and neurosurgical education, specifically focusing on their emerging capabilities in image interpretation. MATERIAL AND METHODS: This study evaluated the performance of three major LLMs (ChatGPT-4o, Grok, and Gemini) on imagebased neurosurgical proficiency board questions and compared their latest versions. RESULTS: Real-life candidates answered correctly 70.75% of the time. LLMs answered correctly 47.38% of the time and were significantly outperformed by the candidates. Prompt selection was found to significantly influence the performance of GPT and Grok, but not Gemini. Matching and significantly outperforming the candidates was only possible by combining the best answers from all three LLMs across four runs. CONCLUSION: Although previous research has demonstrated strong capabilities of LLMs in text-only questions, this the results of the present study revealed that image analysis abilities of these models need further improvement when compared to actual candidates. Furthermore, the impact of prompt selection and repeated questioning should be emphasized, particularly when seeking correlation with the real-life exam results.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSurgical Simulation and TrainingRadiology practices and education
Volltext beim Verlag öffnen