Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of Seven Large Language Models on Anatomy Examination Questions

2026·2 Zitationen·Clinical AnatomyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Artificial intelligence is among the most rapidly developing branches of technology. It has proven to be a helpful tool in various fields, including medicine. Significant advances in the development of new language models prompt an evaluation of their effectiveness across various areas of medicine, including anatomy. This study aimed to assess the effectiveness of artificial intelligence in solving theoretical anatomy exams designed for medical students. The study utilized 555 multiple-choice questions (150 in Polish and 405 in English) sourced from past anatomy exams for the medical program. The models tested included: ChatGPT-4o mini, ChatGPT-4o, DeepSeek, Copilot, Gemini, and two Polish models: Bielik and PLLum. Each question was asked only once. For analysis purposes, the questions were categorized by type and by the anatomical structure they addressed. Out of 555 questions, ChatGPT-4o mini answered 394 correctly (71%), ChatGPT-4o - 461 (83.1%), DeepSeek - 427 (76.9%), Copilot - 442 (79.6%), Gemini - 439 (78.8%), Bielik - 166 (29.9%), and PLLum - 222 (40.0%). The language models performed poorest on multiple-answer questions (37.6%) and best on questions concerning the function of a given organ (75%). Most of the tested language models are capable of independently passing the exam, which should serve as a warning to teaching staff supervising students during exams and assessments. Properly formulated questions can currently hinder students relying on artificial intelligence from passing, but ongoing AI advancements may result in even higher pass rates in the future.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAnatomy and Medical TechnologyClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Performance of Seven Large Language Models on Anatomy Examination Questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen