Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A comparative analysis of the performance of large language models in the basic life support exam: Comprehensive evaluation of ChatGPT-4o, Gemini 2.0, Claude 3.5, and DeepSeek R1
0
Zitationen
11
Autoren
2025
Jahr
Abstract
Aim: Considering the growing role artificial intelligence technologies play in medical education, this study aims to provide a comparative evaluation of the performances of large language models ChatGPT-4o, Gemini 2.0, Claude 3.5, and DeepSeek R1 in the Basic Life Support (BLS) Exam.Materials and Methods: In this observational study, we presented four large language models with 25 multiple-choice questions based on the American Heart Association (AHA) guidelines.Questions were divided into two categories as knowledge-based (n = 14, 56%) and case-based (n = 11, 44%).Response consistency was ensured by presenting each question on three separate days to all models.Models' accuracy rates were assessed using overall accuracy, strict accuracy, and ideal accuracy criteria.Results: In the overall accuracy assessment, ChatGPT-4o and DeepSeek R1 models showed 100% success, and Gemini 2.0 and Claude 3.5 models achieved 96% success rate.All models performed perfectly on the case-based questions.On the knowledge-based questions, ChatGPT-4o and DeepSeek R1 scored full points, while Gemini 2.0 and Claude 3.5 achieved 90.9% success.Statistical analysis showed no significant difference between results (p = 0.368).Discussion: Large language models show high accuracy rates in BLS training.These technologies can be used in supportive roles in medical education, but human supervision is critical in clinical decision-making.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.