Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Seven Large Language Models on Anatomy Examination Questions
0
Zitationen
8
Autoren
2026
Jahr
Abstract
Artificial intelligence is among the most rapidly developing branches of technology. It has proven to be a helpful tool in various fields, including medicine. Significant advances in the development of new language models prompt an evaluation of their effectiveness across various areas of medicine, including anatomy. This study aimed to assess the effectiveness of artificial intelligence in solving theoretical anatomy exams designed for medical students. The study utilized 555 multiple-choice questions (150 in Polish and 405 in English) sourced from past anatomy exams for the medical program. The models tested included: ChatGPT-4o mini, ChatGPT-4o, DeepSeek, Copilot, Gemini, and two Polish models: Bielik and PLLum. Each question was asked only once. For analysis purposes, the questions were categorized by type and by the anatomical structure they addressed. Out of 555 questions, ChatGPT-4o mini answered 394 correctly (71%), ChatGPT-4o - 461 (83.1%), DeepSeek - 427 (76.9%), Copilot - 442 (79.6%), Gemini - 439 (78.8%), Bielik - 166 (29.9%), and PLLum - 222 (40.0%). The language models performed poorest on multiple-answer questions (37.6%) and best on questions concerning the function of a given organ (75%). Most of the tested language models are capable of independently passing the exam, which should serve as a warning to teaching staff supervising students during exams and assessments. Properly formulated questions can currently hinder students relying on artificial intelligence from passing, but ongoing AI advancements may result in even higher pass rates in the future.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.