Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of the Performance of 3 Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases (Preprint)
0
Zitationen
5
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> Generative large language models (LLMs) are increasingly integrated into the medical field. However, their actual efficacy in clinical decision-making remains partially unexplored. </sec> <sec> <title>OBJECTIVE</title> This study evaluated the diagnostic and therapeutic capabilities of 3 LLMs (ChatGPT-4, Gemini and Med-Go) in addressing real clinical cases. </sec> <sec> <title>METHODS</title> This study involved 134 clinical cases spanning 9 medical disciplines. The LLMs evaluated were ChatGPT-4, Gemini and Med-Go. Each LLM was required to provide suggestions for diagnosis, diagnostic criteria, differential diagnosis, examination and treatment for every case. Responses were scored by 2 experts using a predefined rubric. </sec> <sec> <title>RESULTS</title> In overall performance among the models, Med-Go achieved the highest median score (37.5, IQR 31.9-41.5), while Gemini recorded the lowest (33.0, IQR 25.5-36.6), showing significant statistical difference among the 3 LLMs (p < 0.001). Analysis revealed that responses related to differential diagnosis were the weakest, while those pertaining to treatment recommendations were the strongest. Med-Go displayed notable performance advantages in gastroenterology, nephrology, and neurology. </sec> <sec> <title>CONCLUSIONS</title> The findings show that all 3 LLMs achieved over 60% of the maximum possible score, indicating their potential applicability in clinical practice. However, inaccuracies that could lead to adverse decisions underscore the need for caution in their application. Med-Go's superior performance highlights the benefits of incorporating specialized medical knowledge into LLMs training. It is anticipated that further development and refinement of medical LLMs will enhance their precision and safety in clinical use. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.