Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Assessment of Large Language Models in Optics and Refractive Surgery: Performance on Multiple-Choice Questions

2025·1 Zitationen·VisionOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study aimed to evaluate the performance of seven advanced AI Large Language Models (LLMs)-ChatGPT 4o, ChatGPT O3 Mini, ChatGPT O1, DeepSeek V3, DeepSeek R1, Gemini 2.0 Flash, and Grok-3-in answering multiple-choice questions (MCQs) in optics and refractive surgery, to assess their role in medical education for residents. The AI models were tested using 134 publicly available MCQs from national ophthalmology certification exams, categorized by the need to perform calculations, the relevant subspecialty, and the use of images. Accuracy was analyzed and compared statistically. ChatGPT O1 achieved the highest overall accuracy (83.5%), excelling in complex optical calculations (84.1%) and optics questions (82.4%). DeepSeek V3 displayed superior accuracy in refractive surgery-related questions (89.7%), followed by ChatGPT O3 Mini (88.4%). ChatGPT O3 Mini significantly outperformed others in image analysis, with 88.2% accuracy. Moreover, ChatGPT O1 demonstrated comparable accuracy rates for both calculated and non-calculated questions (84.1% vs. 83.3%). This is in stark contrast to other models, which exhibited significant discrepancies in accuracy for calculated and non-calculated questions. The findings highlight the ability of LLMs to achieve high accuracy in ophthalmology MCQs, particularly in complex optical calculations and visual items. These results suggest potential applications in exam preparation and medical training contexts, while underscoring the need for future studies designed to directly evaluate their role and impact in medical education. The findings highlight the significant potential of AI models in ophthalmology education, particularly in performing complex optical calculations and visual stem questions. Future studies should utilize larger, multilingual datasets to confirm and extend these preliminary findings.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiology practices and educationClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Comparative Assessment of Large Language Models in Optics and Refractive Surgery: Performance on Multiple-Choice Questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen