Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative performance of ChatGPT-4o, ChatGPT-5, and gemini 2.5 flash on Persian internal medicine subspecialty board exams
1
Zitationen
5
Autoren
2025
Jahr
Abstract
This study compared the performance of ChatGPT-4o, ChatGPT-5, and Gemini 2.5 Flash on the 2025 Iranian internal medicine subspecialty board examinations. A total of 650 multiple-choice questions from six subspecialties were tested, excluding image-based items. Each question was presented in Persian, and responses were evaluated against the official answer key. Accuracy rates were 68.9% for ChatGPT-4o, 74.5% for ChatGPT-5, and 79.9% for Gemini 2.5 Flash, with Gemini performing significantly better than both ChatGPT versions. ChatGPT-5 also showed a significant improvement over ChatGPT-4o, confirming rapid progress in model development. Subspecialty analysis revealed stronger results in rheumatology and respiratory medicine compared to nephrology, while question type and length had no significant impact on outcomes. An artificial neural network that combined the outputs of all three models reached 81.6% accuracy, slightly exceeding Gemini alone. These findings highlight Gemini-2.5 as the most reliable model for this high-stakes internal medicine exam. The results support the growing role of advanced AI systems as assistants in medical education and clinical practice. However, further research is needed to assess their use in multimodal and real-world clinical tasks.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.