Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Performance of Large Language Models on the MCAT

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The emergence of large language models (LLMs) such as OpenAI’s ChatGPT and Google’s Gemini has opened new possibilities for their use in standardized test preparation. This study evaluates the performance of ChatGPT 4.0 and Gemini (both December 2024 versions) on an official AAMC full-length Medical College Admission Test (MCAT) practice exam. Using a standardized input protocol, we compared the models’ answers and calculated accuracy, MCAT section scores, and percentile rankings. ChatGPT outperformed Gemini, achieving a score of 522 (99th percentile) with an accuracy of 90.87%, compared to Gemini’s 518 (95th percentile) and 84.78% accuracy, with a statistically significant difference $(\mathbf{p}=\mathbf{0. 0 0 5})$. While these results highlight the educational potential of LLMs, they also raise important questions about the relevance of standardized testing in an era of increasingly accessible AI tools. Furthermore, although LLMs perform well on exams, their non-negligible error rates caution against their use in clinical decision-making without human oversight. This study contributes to the growing discussion on how AI may transform medical education, assessment, and the future role of physicians.

Autoren

Institutionen

California Northstate University(US)

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Ethics and Social Impacts of AI

Volltext beim Verlag öffnen

Evaluating the Performance of Large Language Models on the MCAT

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen