Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Performance of ChatGPT-5 and Gemini 2.5 on the Official Clinical Diabetology Specialty Examination

2026·0 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Introduction Artificial intelligence (AI) is becoming progressively popular in so many parts of our lives, and medicine is not an exception. Many people from the medical and non-medical environments ask themselves, "Will it be better than the doctors?". To become a doctor in Poland, you have to pass many exams, which also include a specialization exam. How can AI cope with the Official Clinical Diabetology Specialty Test? Objective The purpose of this article was to show how AI tools, such as ChatGPT-5 (OpenAI, San Francisco, CA, USA) and Gemini 2.5 (Google DeepMind, London, UK), are handling the Official Clinical Diabetology Specialty Test of Poland. The first outcome was to compare the accuracy of the answers with the official answer key. Secondly, the result was the confidence of models in their responses. Both of the models were statistically compared using McNemar's test. Methodology The study analyzed 117 questions randomly chosen from the Centre for Medical Examination (CEM) archive in Łódź, Poland. The questions were multiple-choiced with one correct answer. ChatGPT-5 and Gemini 2.5 were responding to these questions in the Polish language. The assessment was based on the official key advertised on the CEM. Statistical analysis was executed using McNemar's test. Results ChatGPT-5 answered correctly for 92 questions (78.63%). On the other hand, Gemini 2.5 achieved an accuracy rate of 68.38%, because it gave correct answers in 80 of 117 questions. Collations of these two models received statistically relevant differences (χ² = 4.84; p = 0.0455). In an analysis of confidence in giving the answers, both models were on a similar level (GPT-5 = 5, while Gemini 2.5 = 4.957 on a five-point scale). Conclusions Scores, which reached both of the AI models, ChatGPT-5 and Gemini 2.5, allowed for passing the Official Clinical Diabetology Specialty Examination. The results are hopeful, because we can use AI in the study process, but it still has to be more adapted to manage clinical cases. We can use it as a tool in learning, but for now, it cannot step in for specialists.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMobile Health and mHealth ApplicationsClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Comparative Performance of ChatGPT-5 and Gemini 2.5 on the Official Clinical Diabetology Specialty Examination

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen