Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Analyzing Evaluation Methods for Large Language Models in the Medical Field: A Scoping Review
0
Zitationen
4
Autoren
2024
Jahr
Abstract
<title>Abstract</title> <bold>Background: </bold>Owing to the rapid growth in popularity of Large Language Models (LLM), various performance evaluation studies have been conducted to confirm their applicability in the medical field. However, there is still no clear framework for an LLM evaluation. <bold>Objective: </bold>By reviewing studies on LLM evaluations in the medical field and analyzing the research methods used in these studies, this study aims to provide a reference for future researchers designing LLM studies. <bold>Methods & Materials</bold>: We conducted a scoping review of three databases (PubMed, Embase, and MEDLINE) to identify LLMs published between January 1, 2023, and September 30, 2023. We analyzed the method type, number of questions (queries), evaluators, repeat measurements, additional analysis methods, engineered prompts, and metrics other than accuracy. <bold>Results:</bold> A total of 142 articles met the inclusion criteria. The LLM evaluation was primarily categorized as either providing test examinations (n=53, 37.3%) or being evaluated by a medical professional (n=80, 56.3%), with some hybrid cases (n=5, 3.5%) or a combination of the two (n=4, 2.8%). Most studies had 100 or fewer questions (n=18, 29.0%), 15 (24.2%) performed repeated measurements, 18 (29.0%) performed additional analyses, and 8 (12.9%) used prompt engineering. For medical assessment, most studies had 50 or fewer queries (n=54, 64.3%), most studies had two evaluators (n=43, 48.3%), and 14 (14.7%) used prompt engineering. <bold>Conclusions: </bold>More research is required regarding the application of LLMs in healthcare. Although previous studies have evaluated performance, future studies will likely focus on improving performance. For these studies to be conducted systematically, a well-structured methodology must be designed.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.