Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative analysis of <scp>LLMs</scp> performance in medical embryology: A cross‐platform study of <scp>ChatGPT</scp>, Claude, Gemini, and Copilot

2025·24 Zitationen·Anatomical Sciences Education

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Integrating artificial intelligence, particularly large language models (LLMs), into medical education represents a significant new step in how medical knowledge is accessed, processed, and evaluated. The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots in different topics of medical embryology courses. Two hundred United States Medical Licensing Examination (USMLE)-style multiple-choice questions were selected from the course exam database and distributed across 20 topics. The results of 3 attempts by GPT-4o, Claude, Gemini, Copilot, and GPT-3.5 to answer the assessment items were evaluated. Statistical analyses included intraclass correlation coefficients for reliability, one-way and two-way mixed ANOVAs for performance comparisons, and post hoc analyses. Effect sizes were calculated using Cohen's f and eta-squared (η<sup>2</sup>). On average, the selected chatbots correctly answered 78.7% ± 15.1% of the questions. GPT-4o and Claude performed best, correctly answering 89.7% and 87.5% of the questions, respectively, without a statistical difference in their performance (p = 0.238). The performance of other chatbots was significantly lower (p < 0.01): Copilot (82.5%), Gemini (74.8%), and GPT-3.5 (59.0%). Test-retest reliability analysis showed good reliability for GPT-4o (ICC = 0.803), Claude (ICC = 0.865), and Gemini (ICC = 0.876), with moderate reliability for Copilot and GPT-3.5. This study suggests that AI models like GPT-4o and Claude show promise for providing tailored embryology instruction, though instructor verification remains essential.

Autoren

Institutionen

Alfaisal University(SA)

Themen

Artificial Intelligence in Healthcare and EducationSimulation-Based Education in HealthcareAnatomy and Medical Technology

Volltext beim Verlag öffnen

Comparative analysis of <scp>LLMs</scp> performance in medical embryology: A cross‐platform study of <scp>ChatGPT</scp>, Claude, Gemini, and Copilot

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen