Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of the ChatGPT-4o Language Model in Solving the Ophthalmology Specialization Exam
5
Zitationen
15
Autoren
2025
Jahr
Abstract
Background Artificial intelligence (AI), particularly language models such as ChatGPT, is gaining importance in medical education and knowledge assessment. Previous studies have demonstrated the growing effectiveness of AI in solving medical exams, including the Final Medical Examination (LEK) and Polish State Specialization Exam (PES) in various specialties, raising questions about its usefulness as a tool to support specialist training processes. Objective The aim of this study was to assess the effectiveness of the latest ChatGPT-4o model in solving the PES in ophthalmology. The analysis focused on the accuracy of the answers and the model's declared confidence level to evaluate its potential educational usefulness. Methods The study was based on the official PES ophthalmology exam (Spring 2024), consisting of 120 multiple-choice questions. The ChatGPT-4o model was familiarized with the exam regulations and questions, which were input in Polish. The effectiveness of the answers was assessed based on the Medical Education Center (CEM) answer key, as well as the model's declared confidence level (on a scale of 1 to 5). The questions were divided into clinical and theoretical categories. Data were analyzed statistically using the chi-square test and the Mann-Whitney U test. Results The model provided 94 correct answers (78.3%), exceeding the passing threshold. No significant difference in effectiveness was observed between clinical and non-clinical questions (p = 0.709). The analysis of the confidence level revealed that correct answers were significantly more often provided with higher confidence (p < 0.001), suggesting that the model's self-assessment could be an indicator of answer accuracy. Conclusions ChatGPT-4o demonstrated high effectiveness in the PES ophthalmology exam, confirming the potential of AI in specialist education. The confidence level of answers could serve as a useful tool in assessing the reliability of responses. Despite promising results, expert supervision and further research in various medical fields are necessary before wider implementation of AI models in medical education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.
Autoren
Institutionen
- Zdravstveni centar(RS)
- Medical University of Silesia(PL)
- Military University of Technology in Warsaw(PL)
- University of Szczecin(PL)
- Academy of Fine Arts in Katowice(PL)
- Międzyleski Szpital Specjalistyczny w Warszawie(PL)
- Children's Specialized Hospital(US)
- Poznan University of Medical Sciences(PL)
- WSB University(PL)
- Wroclaw Medical University(PL)