OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 22:29

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Performance of Large Language Models on the MCAT

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2025

Jahr

Abstract

The emergence of large language models (LLMs) such as OpenAI’s ChatGPT and Google’s Gemini has opened new possibilities for their use in standardized test preparation. This study evaluates the performance of ChatGPT 4.0 and Gemini (both December 2024 versions) on an official AAMC full-length Medical College Admission Test (MCAT) practice exam. Using a standardized input protocol, we compared the models’ answers and calculated accuracy, MCAT section scores, and percentile rankings. ChatGPT outperformed Gemini, achieving a score of 522 (99th percentile) with an accuracy of 90.87%, compared to Gemini’s 518 (95th percentile) and 84.78% accuracy, with a statistically significant difference $(\mathbf{p}=\mathbf{0. 0 0 5})$. While these results highlight the educational potential of LLMs, they also raise important questions about the relevance of standardized testing in an era of increasingly accessible AI tools. Furthermore, although LLMs perform well on exams, their non-negligible error rates caution against their use in clinical decision-making without human oversight. This study contributes to the growing discussion on how AI may transform medical education, assessment, and the future role of physicians.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Ethics and Social Impacts of AI
Volltext beim Verlag öffnen