OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 08:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A comparative analysis of the performance of large language models in the basic life support exam: Comprehensive evaluation of ChatGPT-4o, Gemini 2.0, Claude 3.5, and DeepSeek R1

2025·0 Zitationen·The Annals of Clinical and Analytical MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

11

Autoren

2025

Jahr

Abstract

Aim: Considering the growing role artificial intelligence technologies play in medical education, this study aims to provide a comparative evaluation of the performances of large language models ChatGPT-4o, Gemini 2.0, Claude 3.5, and DeepSeek R1 in the Basic Life Support (BLS) Exam.Materials and Methods: In this observational study, we presented four large language models with 25 multiple-choice questions based on the American Heart Association (AHA) guidelines.Questions were divided into two categories as knowledge-based (n = 14, 56%) and case-based (n = 11, 44%).Response consistency was ensured by presenting each question on three separate days to all models.Models' accuracy rates were assessed using overall accuracy, strict accuracy, and ideal accuracy criteria.Results: In the overall accuracy assessment, ChatGPT-4o and DeepSeek R1 models showed 100% success, and Gemini 2.0 and Claude 3.5 models achieved 96% success rate.All models performed perfectly on the case-based questions.On the knowledge-based questions, ChatGPT-4o and DeepSeek R1 scored full points, while Gemini 2.0 and Claude 3.5 achieved 90.9% success.Statistical analysis showed no significant difference between results (p = 0.368).Discussion: Large language models show high accuracy rates in BLS training.These technologies can be used in supportive roles in medical education, but human supervision is critical in clinical decision-making.

Ähnliche Arbeiten