Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Performance of ChatGPT-5 and Gemini 2.5 on the Official Clinical Diabetology Specialty Examination
0
Zitationen
16
Autoren
2026
Jahr
Abstract
Introduction Artificial intelligence (AI) is becoming progressively popular in so many parts of our lives, and medicine is not an exception. Many people from the medical and non-medical environments ask themselves, "Will it be better than the doctors?". To become a doctor in Poland, you have to pass many exams, which also include a specialization exam. How can AI cope with the Official Clinical Diabetology Specialty Test? Objective The purpose of this article was to show how AI tools, such as ChatGPT-5 (OpenAI, San Francisco, CA, USA) and Gemini 2.5 (Google DeepMind, London, UK), are handling the Official Clinical Diabetology Specialty Test of Poland. The first outcome was to compare the accuracy of the answers with the official answer key. Secondly, the result was the confidence of models in their responses. Both of the models were statistically compared using McNemar's test. Methodology The study analyzed 117 questions randomly chosen from the Centre for Medical Examination (CEM) archive in Łódź, Poland. The questions were multiple-choiced with one correct answer. ChatGPT-5 and Gemini 2.5 were responding to these questions in the Polish language. The assessment was based on the official key advertised on the CEM. Statistical analysis was executed using McNemar's test. Results ChatGPT-5 answered correctly for 92 questions (78.63%). On the other hand, Gemini 2.5 achieved an accuracy rate of 68.38%, because it gave correct answers in 80 of 117 questions. Collations of these two models received statistically relevant differences (χ² = 4.84; p = 0.0455). In an analysis of confidence in giving the answers, both models were on a similar level (GPT-5 = 5, while Gemini 2.5 = 4.957 on a five-point scale). Conclusions Scores, which reached both of the AI models, ChatGPT-5 and Gemini 2.5, allowed for passing the Official Clinical Diabetology Specialty Examination. The results are hopeful, because we can use AI in the study process, but it still has to be more adapted to manage clinical cases. We can use it as a tool in learning, but for now, it cannot step in for specialists.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.