OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 09:03

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessment of the Ability of the ChatGPT-5 Model to Pass the Endocrinology Specialization Exam

2025·1 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

1

Zitationen

17

Autoren

2025

Jahr

Abstract

Background In recent years, AI has undergone rapid development, particularly with the advancement of the ChatGPT model developed by OpenAI, which has found broad applications across scientific disciplines, including medicine. This study aims to evaluate the performance of the most recent ChatGPT-5 model in completing the Polish specialty examination in endocrinology. Specifically, we assessed the model's accuracy in answering clinical and theoretical questions, as well as the confidence and consistency of its responses. Materials and methods This study utilized the Polish spring 2025 endocrinology specialty examination, which comprised 120 multiple-choice questions, each with five options and a single correct answer. The ChatGPT-5 model was provided with the official examination regulations as well as the complete set of questions and answer options in Polish. Model-generated responses were compared against the official answer key published by the Medical Examination Center (CEM), and the declared confidence level (1-5 scale) was recorded. Questions were classified into two categories: theoretical/other and clinical. Statistical analyses were performed using Microsoft Excel and GraphPad Prism 10, employing the chi-square test and the Mann-Whitney U test. Results The latest ChatGPT-5 Plus model achieved a score of 76.47%, corresponding to 91 correct and 28 incorrect answers (23.53%). One question was excluded due to inconsistency with current medical knowledge. The passing threshold of at least 60% was exceeded. Conclusions The ChatGPT-5 model successfully completed the Polish endocrinology specialty examination, surpassing the official pass mark of 60%. Its result, an accuracy of 76.47%, suggests that large language models may serve as a meaningful source of support in postgraduate medical education and assessment preparation. However, the reliability and usefulness of the ChatGPT-5 model in clinical decision-making remain uncertain. There is no evidence that this tool can effectively handle ambiguous endocrinological cases, nor that its safe integration with electronic medical records is feasible. Therefore, further research is required to assess whether ChatGPT-5 may be applied not only in medical training but also as a complementary aid in clinical practice.

Ähnliche Arbeiten