Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of the Performance of 3 Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases (Preprint)

2024·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

<sec> <title>BACKGROUND</title> Generative large language models (LLMs) are increasingly integrated into the medical field. However, their actual efficacy in clinical decision-making remains partially unexplored. </sec> <sec> <title>OBJECTIVE</title> This study evaluated the diagnostic and therapeutic capabilities of 3 LLMs (ChatGPT-4, Gemini and Med-Go) in addressing real clinical cases. </sec> <sec> <title>METHODS</title> This study involved 134 clinical cases spanning 9 medical disciplines. The LLMs evaluated were ChatGPT-4, Gemini and Med-Go. Each LLM was required to provide suggestions for diagnosis, diagnostic criteria, differential diagnosis, examination and treatment for every case. Responses were scored by 2 experts using a predefined rubric. </sec> <sec> <title>RESULTS</title> In overall performance among the models, Med-Go achieved the highest median score (37.5, IQR 31.9-41.5), while Gemini recorded the lowest (33.0, IQR 25.5-36.6), showing significant statistical difference among the 3 LLMs (p < 0.001). Analysis revealed that responses related to differential diagnosis were the weakest, while those pertaining to treatment recommendations were the strongest. Med-Go displayed notable performance advantages in gastroenterology, nephrology, and neurology. </sec> <sec> <title>CONCLUSIONS</title> The findings show that all 3 LLMs achieved over 60% of the maximum possible score, indicating their potential applicability in clinical practice. However, inaccuracies that could lead to adverse decisions underscore the need for caution in their application. Med-Go's superior performance highlights the benefits of incorporating specialized medical knowledge into LLMs training. It is anticipated that further development and refinement of medical LLMs will enhance their precision and safety in clinical use. </sec>

Autoren

Institutionen

Shanghai East Hospital(CN)

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

Evaluation of the Performance of 3 Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen