Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models on Cognitive Aptitude Testing: A Multi-Run Evaluation on the German Medical School Admission Test (TMS)
0
Zitationen
4
Autoren
2026
Jahr
Abstract
BACKGROUND AND OBJECTIVES: Large language models (LLMs) have demonstrated high performance on knowledge-based medical examinations but their capabilities on cognitive aptitude tests emphasizing reasoning and abstraction remain underexplored. The Test for Medical Studies (TMS), a German medical school admission test, provides a standardized framework to examine these capabilities. This study aimed to evaluate the performance and consistency of multiple LLMs on text-based and visual-analytic TMS items. MATERIALS AND METHODS: Eight contemporary LLMs, comprising proprietary and open-source systems, were evaluated using a multi-run design on standardized TMS items spanning text-based and visual-analytic cognitive domains. RESULTS: Mean accuracy remained substantially below levels typically reported for knowledge-based medical examinations, with marked performance differences between text-based and visual-analytic subtests. Open-source models performed competitively compared with proprietary systems. Inter-run reliability was heterogeneous, indicating notable variability across repeated evaluations. CONCLUSIONS: Current LLMs show limited and domain-dependent performance on cognitive aptitude tasks relevant to medical school admission. High accuracy on knowledge-based examinations does not translate into stable performance on aptitude tests emphasizing fluid intelligence. The observed modality-dependent performance patterns and inter-run variability highlight the importance of differentiated, multi-run evaluation strategies when assessing LLMs for applications in medical education.
Ähnliche Arbeiten
TRANSFER OF TRAINING: A REVIEW AND DIRECTIONS FOR FUTURE RESEARCH
1988 · 3.210 Zit.
Systematic Review of Depression, Anxiety, and Other Indicators of Psychological Distress Among U.S. and Canadian Medical Students
2006 · 2.866 Zit.
Impact of Formal Continuing Medical Education
1999 · 2.363 Zit.
Likert scales: how to (ab)use them
2004 · 2.361 Zit.
Prevalence of Depression, Depressive Symptoms, and Suicidal Ideation Among Medical Students
2016 · 2.333 Zit.