Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of GPT-5 and GPT-4o in Solving the Polish Centre for Medical Examinations (CEM) Gastroenterology Examination
0
Zitationen
15
Autoren
2026
Jahr
Abstract
Both GPT-4o and GPT-5 exceeded the passing threshold for the CEM gastroenterology examination, demonstrating strong performance on a specialty-level medical assessment. Although overall accuracy was comparable, GPT-5 showed superior alignment between confidence and correctness, suggesting improved metacognitive reliability rather than a substantial gain in raw accuracy. These findings highlight the potential educational value of newer LLMs while underscoring important limitations, including the restricted sample size, exam-specific context, and lack of assessment of real-world clinical reasoning. Ethical considerations such as hallucinations, overconfidence, and inappropriate clinical reliance remain critical barriers to direct clinical deployment. Future research should focus on broader exam representativeness, task difficulty stratification, and controlled integration of LLMs into postgraduate medical education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.