Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model (Preprint)
0
Zitationen
11
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> Large language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology. </sec> <sec> <title>OBJECTIVE</title> This study aims to enhance ophthalmic knowledge by refining a general LLM into an ophthalmology-specialized assistant for patient inquiries and medical education. </sec> <sec> <title>METHODS</title> We transformed Llama2 into an ophthalmology-specialized LLM, termed EyeGPT, through the following 3 strategies: prompt engineering for role-playing, fine-tuning with publicly available data sets filtered for eye-specific terminology (83,919 samples), and retrieval-augmented generation leveraging a medical database and 14 ophthalmology textbooks. The efficacy of various EyeGPT variants was evaluated by 4 board-certified ophthalmologists through comprehensive use of 120 diverse category questions in both simple and complex question-answering scenarios. The performance of the best EyeGPT model was then compared with that of the unassisted human physician group and the EyeGPT+human group. We proposed 4 metrics for assessment: accuracy, understandability, trustworthiness, and empathy. The proportion of hallucinations was also reported. </sec> <sec> <title>RESULTS</title> The best fine-tuned model significantly outperformed the original Llama2 model at providing informed advice (mean 9.30, SD 4.42 vs mean 13.79, SD 5.70; <i>P</i><.001) and mitigating hallucinations (97/120, 80.8% vs 53/120, 44.2%, <i>P</i><.001). Incorporating information retrieval from reliable sources, particularly ophthalmology textbooks, further improved the model's response compared with solely the best fine-tuned model (mean 13.08, SD 5.43 vs mean 15.14, SD 4.64; <i>P</i>=.001) and reduced hallucinations (71/120, 59.2% vs 57/120, 47.4%, <i>P</i>=.02). Subgroup analysis revealed that EyeGPT showed robustness across common diseases, with consistent performance across different users and domains. Among the variants, the model integrating fine-tuning and book retrieval ranked highest, closely followed by the combination of fine-tuning and the manual database, standalone fine-tuning, and pure role-playing methods. EyeGPT demonstrated competitive capabilities in understandability and empathy when compared with human ophthalmologists. With the assistance of EyeGPT, the performance of the ophthalmologist was notably enhanced. </sec> <sec> <title>CONCLUSIONS</title> We pioneered and introduced EyeGPT by refining a general domain LLM and conducted a comprehensive comparison and evaluation of different strategies to develop an ophthalmology-specific assistant. Our results highlight EyeGPT’s potential to assist ophthalmologists and patients in medical settings. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.