Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of large Language model performance on Persian rheumatology board exams: accuracy and clinical reasoning of GPT-4o vs. GPT-5.1

2026·0 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models are increasingly integrated into medical education, yet their performance in non-English clinical examinations, particularly Persian, remains limited. This study evaluated how GPT-4o and GPT-5.1 perform on Iranian Rheumatology Board examination questions. A total of 204 multiple-choice items were administered in Persian using a similar prompt. Accuracy was determined using the official answer key, and six board-certified rheumatologists independently scored each model’s clinical reasoning on a 1–5 scale. GPT-5.1 demonstrated markedly superior performance, achieving 76% accuracy compared with 64.5% for GPT-4o, alongside significantly higher reasoning scores. Unlike GPT-4o, which showed considerable variability across question types, GPT-5.1 performed consistently across basic science, clinical scenarios, diagnosis, and treatment domains. Although inter-rater agreement among rheumatologists was modest, it remained statistically significant. These findings suggest that newer-generation LLMs provide more reliable reasoning and accuracy in Persian medical assessments. Nevertheless, despite their promising role as educational aids, current models are not yet suitable for high-stakes clinical decision-making and require continued evaluation across diverse languages and specialties.

Autoren

Institutionen

Themen

Clinical Reasoning and Diagnostic SkillsArtificial Intelligence in Healthcare and EducationRheumatoid Arthritis Research and Therapies

Volltext beim Verlag öffnen

Evaluation of large Language model performance on Persian rheumatology board exams: accuracy and clinical reasoning of GPT-4o vs. GPT-5.1

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen