Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Assessment of Large Language Models in Optics and Refractive Surgery: Performance on Multiple-Choice Questions
0
Zitationen
5
Autoren
2025
Jahr
Abstract
This study aimed to evaluate the performance of seven advanced AI Large Language Models (LLMs)-ChatGPT 4o, ChatGPT O3 Mini, ChatGPT O1, DeepSeek V3, DeepSeek R1, Gemini 2.0 Flash, and Grok-3-in answering multiple-choice questions (MCQs) in optics and refractive surgery, to assess their role in medical education for residents. The AI models were tested using 134 publicly available MCQs from national ophthalmology certification exams, categorized by the need to perform calculations, the relevant subspecialty, and the use of images. Accuracy was analyzed and compared statistically. ChatGPT O1 achieved the highest overall accuracy (83.5%), excelling in complex optical calculations (84.1%) and optics questions (82.4%). DeepSeek V3 displayed superior accuracy in refractive surgery-related questions (89.7%), followed by ChatGPT O3 Mini (88.4%). ChatGPT O3 Mini significantly outperformed others in image analysis, with 88.2% accuracy. Moreover, ChatGPT O1 demonstrated comparable accuracy rates for both calculated and non-calculated questions (84.1% vs. 83.3%). This is in stark contrast to other models, which exhibited significant discrepancies in accuracy for calculated and non-calculated questions. The findings highlight the ability of LLMs to achieve high accuracy in ophthalmology MCQs, particularly in complex optical calculations and visual items. These results suggest potential applications in exam preparation and medical training contexts, while underscoring the need for future studies designed to directly evaluate their role and impact in medical education. The findings highlight the significant potential of AI models in ophthalmology education, particularly in performing complex optical calculations and visual stem questions. Future studies should utilize larger, multilingual datasets to confirm and extend these preliminary findings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.