Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative analysis of <scp>LLMs</scp> performance in medical embryology: A cross‐platform study of <scp>ChatGPT</scp>, Claude, Gemini, and Copilot
19
Zitationen
3
Autoren
2025
Jahr
Abstract
Integrating artificial intelligence, particularly large language models (LLMs), into medical education represents a significant new step in how medical knowledge is accessed, processed, and evaluated. The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots in different topics of medical embryology courses. Two hundred United States Medical Licensing Examination (USMLE)-style multiple-choice questions were selected from the course exam database and distributed across 20 topics. The results of 3 attempts by GPT-4o, Claude, Gemini, Copilot, and GPT-3.5 to answer the assessment items were evaluated. Statistical analyses included intraclass correlation coefficients for reliability, one-way and two-way mixed ANOVAs for performance comparisons, and post hoc analyses. Effect sizes were calculated using Cohen's f and eta-squared (η<sup>2</sup>). On average, the selected chatbots correctly answered 78.7% ± 15.1% of the questions. GPT-4o and Claude performed best, correctly answering 89.7% and 87.5% of the questions, respectively, without a statistical difference in their performance (p = 0.238). The performance of other chatbots was significantly lower (p < 0.01): Copilot (82.5%), Gemini (74.8%), and GPT-3.5 (59.0%). Test-retest reliability analysis showed good reliability for GPT-4o (ICC = 0.803), Claude (ICC = 0.865), and Gemini (ICC = 0.876), with moderate reliability for Copilot and GPT-3.5. This study suggests that AI models like GPT-4o and Claude show promise for providing tailored embryology instruction, though instructor verification remains essential.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.