Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the Performance of Large Language Models on the Foreign Medical Graduate Examination (FMGE): Insights from GPT-4 Turbo, Gemini Advanced, and LLaMA 3.1 (70B)

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study evaluates three advanced Large Language Models (LLMs)—GPT-4 Turbo, Gemini Advanced, and Meta’s LLaMA 3.1 (70B)—for accurately answering multiple-choice questions from the Foreign Medical Graduate Examination (FMGE). Using a curated set of 427 text-based questions from recent exams, the investigation assessed each model’s overall accuracy, consistency, and error distribution, with statistical validation via McNemar’s test. GPT-4 Turbo achieved roughly 93% accuracy, while Gemini Advanced and LLaMA 3.1 (70B) both approximated 87%, with no significant difference between them. High agreement among models indicates consistent decision-making and stable interpretative patterns. These findings underscore the potential of LLMs as complementary tools for exam preparation, particularly in resource-limited settings. Moreover, the results support the viability of open-source models—exemplified by Meta’s LLaMA 3.1 (70B)—in terms of cost-effectiveness and adaptability. Future research will explore diverse question formats and the integration of these models into clinical decision support systems to further enhance their role in modern medical education.

Assessing the Performance of Large Language Models on the Foreign Medical Graduate Examination (FMGE): Insights from GPT-4 Turbo, Gemini Advanced, and LLaMA 3.1 (70B)

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen