Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Abstract WP88: Can Machine Learning Assist With Large Scale Medical Literature Review?
0
Zitationen
2
Autoren
2024
Jahr
Abstract
Introduction: The pace of emerging scientific literature far outstrips that of guideline updates, yet clinical practice may require early implementation after careful review. Recently developed large language model (LLM) based machine learning systems are potential assistive tools to identify novel clinical practice or research avenues as they emerge. Here we use GPT3.5, a LLM developed by OpenAI, to evaluate 3 interventions for aneurysmal subarachnoid hemorrhage (aSAH), with comparison to conclusions from a systematic review, the 2023 AHA Guideline on Management of aSAH. Methods: A list of articles and the associated abstracts were identified for each of the terms “nimodipine” (N), “magnesium” (M), and “induced hypertension” (IH) via PubMed search, with results topic filtered using the MeSH term “Subarachnoid Hemorrhage.” Review articles and meta-analyses were excluded. Using custom code via the GPT3.5 programming interface, the LLM was asked to read each abstract, decide if the intervention was discussed, and if so, rate it as effective, neutral, or ineffective, corresponding to scores of 1, 0, or -1. Results: Mean efficacy scores for N, M, and IH were 0.67, 0.32, and 0.15 after automated review of 136, 47, and 33 abstracts, respectively. Corresponding guidelines report strong benefit/class 1 (N), no benefit/class 3 (M), weak/Class 2b for therapeutic use of IH, and harm (class 3) for prophylactic use. While N had consistently positive automated scores, manual review of these abstracts showed that it was often used as a standard of care comparator (possible confirmation bias). Though M had consistently poor clinical trial scores, the mostly positive case and retrospective reports raised the score (possible publication bias). IH combined all aspects, with case reports of harm, mixed trial scores, yet also widespread use as standard of care. Conclusion: Though GPT3.5 results roughly parallel human curated ones, this method is not meant to replace the rigorous process of systematic review, but potentially assist as a “first pass” evaluator of new literature in a real-time, continuous manner to improve efficiency. Optimization of LLM prompts may help with specificity, and weighting articles by type or perceived quality may improve accuracy.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.393 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.259 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.688 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.502 Zit.