OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 16:04

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative performance of ChatGPT-4o, ChatGPT-5, and gemini 2.5 flash on Persian internal medicine subspecialty board exams

2025·1 Zitationen·Scientific ReportsOpen Access
Volltext beim Verlag öffnen

1

Zitationen

5

Autoren

2025

Jahr

Abstract

This study compared the performance of ChatGPT-4o, ChatGPT-5, and Gemini 2.5 Flash on the 2025 Iranian internal medicine subspecialty board examinations. A total of 650 multiple-choice questions from six subspecialties were tested, excluding image-based items. Each question was presented in Persian, and responses were evaluated against the official answer key. Accuracy rates were 68.9% for ChatGPT-4o, 74.5% for ChatGPT-5, and 79.9% for Gemini 2.5 Flash, with Gemini performing significantly better than both ChatGPT versions. ChatGPT-5 also showed a significant improvement over ChatGPT-4o, confirming rapid progress in model development. Subspecialty analysis revealed stronger results in rheumatology and respiratory medicine compared to nephrology, while question type and length had no significant impact on outcomes. An artificial neural network that combined the outputs of all three models reached 81.6% accuracy, slightly exceeding Gemini alone. These findings highlight Gemini-2.5 as the most reliable model for this high-stakes internal medicine exam. The results support the growing role of advanced AI systems as assistants in medical education and clinical practice. However, further research is needed to assess their use in multimodal and real-world clinical tasks.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen