Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis
0
Zitationen
10
Autoren
2024
Jahr
Abstract
Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions were randomly selected from a public online medical forum, with 10 questions from each of 25 pediatric departments, spanning from December 1, 2022, to October 30, 2023. Two lightweight open-source LLMs, ChatGLM3-6B and Vicuna-7B, along with a larger-scale model, Vicuna-13B, and the widely-used proprietary ChatGPT-3.5, independently answered these questions in Chinese between November 1, 2023, and November 7, 2023. To assess reproducibility, each inquiry was replicated once. We found that ChatGLM3-6B demonstrated higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001), but all were outperformed by ChatGPT-3.5. ChatGPT-3.5 received the highest ratings in accuracy (65.2%) compared to ChatGLM3-6B (41.2%), Vicuna-13B (11.2%), and Vicuna-7B (4.4%). Similarly, in completeness, ChatGPT-3.5 led (78.4%), followed by ChatGLM3-6B (76.0%), Vicuna-13B (34.8%), and Vicuna-7B (22.0%) in highest ratings. ChatGLM3-6B matched ChatGPT-3.5 in readability, both outperforming Vicuna models (P < .001). In terms of empathy, ChatGPT-3.5 outperformed the lightweight LLMs (P < .001). In safety, all models performed comparably well (P > .05), with over 98.4% of responses being rated as safe. Repetition of inquiries confirmed these findings. In conclusion, Lightweight LLMs demonstrate promising application in pediatric healthcare. However, the observed gap between lightweight and large-scale proprietary LLMs underscores the need for continued development efforts.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.479 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.364 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.814 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.543 Zit.