Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the effectiveness of large language models in medicine education: a comparison of current medicine knowledge

2025·2 Zitationen·International Journal of Complexity in Applied Science and TechnologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Recent advancements in artificial intelligence have led to the development of powerful large language models (LLMs) like ChatGPT-4-turbo, Gemini 2.0 Flash, DeepSeek-R1, and Qwen2.5-Max. This study evaluates their medical knowledge proficiency using multiple-choice questions (MCQs) sourced from a reputable medical textbook, with answers verified by experts. Each model was tested on its ability to select correct answers, and performance was analysed using ANOVA and Tukey's HSD tests. Results showed that while all models exhibited some proficiency, ChatGPT-4-turbo significantly outperformed Gemini 2.0 Flash and Qwen2.5-Max, with no notable difference between ChatGPT-4-turbo and DeepSeek-R1. Despite their capabilities, these models remain unreliable for medical education and assistance. Enhancing their accuracy and reliability is crucial for their effective application in healthcare, enabling medical students and professionals to utilise AI for learning and clinical decision-making. Further development is needed to improve their utility in medical practice.

Autoren

Institutionen

Jahangirnagar University(BD)

Themen

Innovations in Medical EducationArtificial Intelligence in Healthcare and EducationAdvances in Oncology and Radiotherapy

Volltext beim Verlag öffnen

Evaluating the effectiveness of large language models in medicine education: a comparison of current medicine knowledge

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen