Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Performance of Large Language Models in Anatomy Education Advancing Anatomy Learning with ChatGPT-4o

2025·3 Zitationen·European Journal of TherapeuticsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objective: Large language models (LLMs), such as ChatGPT, Gemini, and Copilot, have garnered significant attention across various domains, including education. Their application is becoming increasingly prevalent, particularly in medical education, where rapid access to accurate and up-to-date information is imperative. This study aimed to assess the validity, accuracy, and comprehensiveness of utilizing LLMs for the preparation of lecture notes in medical school anatomy education. Methods: The study evaluated the performance of four large language models—ChatGPT-4o, ChatGPT-4o-Mini, Gemini, and Copilot—in generating anatomy lecture notes for medical students. In the first phase, the lecture notes produced by these models using identical prompts were compared to a widely used anatomy textbook through thematic analysis to assess relevance and alignment with standard educational materials. In the second phase, the generated lecture notes were evaluated using content validity index (CVI) analysis. The threshold values for S-CVI/Ave and S-CVI/UA were set at 0.90 and 0.80, respectively, to determine the acceptability of the content. Results: ChatGPT-4o demonstrated the highest performance, achieving a theme success rate of 94.6% and a subtheme success rate of 76.2%. ChatGPT-4o-Mini followed, with theme and subtheme success rates of 89.2% and 62.3%, respectively. Copilot achieved moderate results, with a theme success rate of 91.8% and a subtheme success rate of 54.9%, while Gemini showed the lowest performance, with a theme success rate of 86.4% and a subtheme success rate of 52.3%. In the Content Validity Index (CVI) analysis, ChatGPT-4o again outperformed the other models, exceeding the thresholds with an S-CVI/Ave value of 0.943 and an S-CVI/UA value of 0.857. ChatGPT-4o-Mini met the S-CVI/UA threshold (0.714) but fell slightly short of the S-CVI/Ave threshold (0.800). Copilot and Gemini, however, exhibited significantly lower CVI results. Copilot achieved an S-CVI/Ave value of 0.486 and an S-CVI/UA value of 0.286, while Gemini obtained the lowest scores, with an S-CVI/Ave value of 0.286 and an S-CVI/UA value of 0.143. Conclusion: This study assessed various LLMs through two distinct analysis methods, revealing that ChatGPT-4o performed best in both thematic analysis and CVI evaluations. These results suggest that anatomy educators and medical students could benefit from adopting ChatGPT-4o as a supplementary tool for anatomy lecture notes generation. Conversely, models like ChatGPT-4o-Mini, Gemini, and Copilot require further improvements to meet the standards necessary for reliable use in medical education.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in cancer detectionAnatomy and Medical Technology

Volltext beim Verlag öffnen

Evaluating the Performance of Large Language Models in Anatomy Education Advancing Anatomy Learning with ChatGPT-4o

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen