OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 14:33

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam

2025·5 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

5

Zitationen

15

Autoren

2025

Jahr

Abstract

Background Artificial intelligence (AI), particularly language models such as ChatGPT, is gaining importance in medical education and knowledge assessment. Previous studies have demonstrated the growing effectiveness of AI in solving medical exams, including the Final Medical Examination (LEK) and Polish State Specialization Exam (PES) in various specialties, raising questions about its usefulness as a tool to support specialist training processes. Objective The aim of this study was to assess the effectiveness of the latest ChatGPT-4o model in solving the PES in ophthalmology. The analysis focused on the accuracy of the answers and the model's declared confidence level to evaluate its potential educational usefulness. Methods The study was based on the official PES ophthalmology exam (Spring 2024), consisting of 120 multiple-choice questions. The ChatGPT-4o model was familiarized with the exam regulations and questions, which were input in Polish. The effectiveness of the answers was assessed based on the Medical Education Center (CEM) answer key, as well as the model's declared confidence level (on a scale of 1 to 5). The questions were divided into clinical and theoretical categories. Data were analyzed statistically using the chi-square test and the Mann-Whitney U test. Results The model provided 94 correct answers (78.3%), exceeding the passing threshold. No significant difference in effectiveness was observed between clinical and non-clinical questions (p = 0.709). The analysis of the confidence level revealed that correct answers were significantly more often provided with higher confidence (p < 0.001), suggesting that the model's self-assessment could be an indicator of answer accuracy. Conclusions ChatGPT-4o demonstrated high effectiveness in the PES ophthalmology exam, confirming the potential of AI in specialist education. The confidence level of answers could serve as a useful tool in assessing the reliability of responses. Despite promising results, expert supervision and further research in various medical fields are necessary before wider implementation of AI models in medical education.

Ähnliche Arbeiten