Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Beyond human ‘eyes’ in neurosurgical exams: success of artificial intelligence (chatgpt-4o, grok, and gemini) in the image-based questions of turkish neurosurgical society proficiency board exams
0
Zitationen
6
Autoren
2025
Jahr
Abstract
AIM: To evaluate the impact of generative artificial intelligence and large language models (LLMs) on medical training and neurosurgical education, specifically focusing on their emerging capabilities in image interpretation. MATERIAL AND METHODS: This study evaluated the performance of three major LLMs (ChatGPT-4o, Grok, and Gemini) on imagebased neurosurgical proficiency board questions and compared their latest versions. RESULTS: Real-life candidates answered correctly 70.75% of the time. LLMs answered correctly 47.38% of the time and were significantly outperformed by the candidates. Prompt selection was found to significantly influence the performance of GPT and Grok, but not Gemini. Matching and significantly outperforming the candidates was only possible by combining the best answers from all three LLMs across four runs. CONCLUSION: Although previous research has demonstrated strong capabilities of LLMs in text-only questions, this the results of the present study revealed that image analysis abilities of these models need further improvement when compared to actual candidates. Furthermore, the impact of prompt selection and repeated questioning should be emphasized, particularly when seeking correlation with the real-life exam results.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.551 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.942 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.