Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Leveraging Open-Source Large Language Models for Data Augmentation to Improve Text Classification in Surveys of Medical Staff (Preprint)
0
Zitationen
5
Autoren
2023
Jahr
Abstract
<sec> <title>BACKGROUND</title> Generative large language models (LLMs) have the potential to revolutionize medical education by generating tailored learning materials, enhancing teaching efficiency, and improving learner engagement. However, the application of LLMs in healthcare settings, particularly for augmenting small datasets in text classification tasks, remains underexplored, particularly for cost- and privacy-conscious applications that do not permit the use of third-party services such as OpenAI’s ChatGPT. </sec> <sec> <title>OBJECTIVE</title> This paper explores the use of open-source LLMs, such as Large Language Model Meta AI (LLaMA) and Alpaca models, for data augmentation in a specific text classification task related to hospital staff surveys. </sec> <sec> <title>METHODS</title> The surveys were designed to elicit narratives of everyday adaptation by frontline radiology staff during the initial phase of the COVID-19 pandemic. The study evaluates the effectiveness of various LLMs, temperature settings, and downstream classifiers in improving classifier performance. </sec> <sec> <title>RESULTS</title> The overall best-performing combination of LLM, temperature, classifier, and number of augments is LLaMA 7B at temperature 0.7 using Robustly Optimized BERT Pretraining Approach (RoBERTa) with 100 augments, with an average the Area Under the Receiver Operating Characteristic curve (AUC) of [0.87] ±[0.02: 1 standard deviation]. The results demonstrate that open-source LLMs can enhance text classifiers' performance for small datasets in healthcare contexts, providing promising pathways for improving medical education processes and patient care practices. </sec> <sec> <title>CONCLUSIONS</title> The study demonstrates the value of data augmentation with open-source LLMs, highlights the importance of privacy and ethical considerations when using LLMs, and suggests future directions for research in this field. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.560 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.451 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.