Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of Large Language Models on Cognitive Aptitude Testing: A Multi-Run Evaluation on the German Medical School Admission Test (TMS)

2026·0 Zitationen·European Journal of Investigation in Health Psychology and EducationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND AND OBJECTIVES: Large language models (LLMs) have demonstrated high performance on knowledge-based medical examinations but their capabilities on cognitive aptitude tests emphasizing reasoning and abstraction remain underexplored. The Test for Medical Studies (TMS), a German medical school admission test, provides a standardized framework to examine these capabilities. This study aimed to evaluate the performance and consistency of multiple LLMs on text-based and visual-analytic TMS items. MATERIALS AND METHODS: Eight contemporary LLMs, comprising proprietary and open-source systems, were evaluated using a multi-run design on standardized TMS items spanning text-based and visual-analytic cognitive domains. RESULTS: Mean accuracy remained substantially below levels typically reported for knowledge-based medical examinations, with marked performance differences between text-based and visual-analytic subtests. Open-source models performed competitively compared with proprietary systems. Inter-run reliability was heterogeneous, indicating notable variability across repeated evaluations. CONCLUSIONS: Current LLMs show limited and domain-dependent performance on cognitive aptitude tasks relevant to medical school admission. High accuracy on knowledge-based examinations does not translate into stable performance on aptitude tests emphasizing fluid intelligence. The observed modality-dependent performance patterns and inter-run variability highlight the importance of differentiated, multi-run evaluation strategies when assessing LLMs for applications in medical education.

Autoren

Institutionen

Themen

Medical Education and AdmissionsArtificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Performance of Large Language Models on Cognitive Aptitude Testing: A Multi-Run Evaluation on the German Medical School Admission Test (TMS)

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen