Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Screening oncology articles in a qualitative literature review using large language models: A comparison of GPT4 versus fine-tuned open source models using expert-annotated data.

2024·1 Zitationen·Journal of Clinical Oncology

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

e23196 Background: Clinical Outcome Assessment (COA) conceptual gap analyses for oncology are complex and time consuming. Artificial Intelligence may efficiently reduce time to completion of such analyses. We aimed to assess two AI models’ performance for literature screening to identify relevant qualitative oncology research. We also compared accuracy and run-time for both AI models. Methods: We manually curated a dataset of title/abstract screening (n = 1,700 study references) across 17 landscape reviews. Among these, 11 landscape reviews (n = 951 study references) spanning over were in oncology, including 8 solid cancers (breast, lung, urothelial, colorectal, esophageal, head and neck, pancreatic, and stomach) and 3 non-solid cancers (lymphoma, acute myeloid leukemia, and multiple myeloma). Each citation was annotated for eligibility (Y/N) by population, study design (qualitative), and reporting of concepts (how patients feel or function). We then compared the accuracy of two AI models at predicting the screening decisions of expert researchers. The two AI models were Generative Pre-trained Transformers 4 (GPT4, OpenAI) prompts and a fine-tuned SciFive biomedical large language model (LLM). We used 70% of the data for training and 30% for test. Accuracy estimates were obtained only for the models’ ability to label eligibility within the 11 oncology datasets. Results: Both LLMs performed well for assessing relevance by oncology population, with F1-scores for the GPT4 and SciFive models being 0.92 and 0.83 respectively (precision was 0.92 and 0.93 respectively). For concept reporting the fine-tuned SciFive model outperformed GPT4 with an F1-score and precision 0.88 and 0.92 versus 0.81 and 0.79. The same was true but less pronounced for eligibility by study design, with an F1-score and precision 0.81 and 0.90 versus 0.86 and 0.76. For overall eligibility, the customized SciFive model outperformed the GPT4 model with an F1-score and precision of 0.84 and 0.92 versus 0.85 and 0.82. Lastly, it took the GPT4 prompts between 10-30 minutes to screen 100 abstracts. By contrast, the customized SciFive model took 1-2 minutes on a computer with a Quadro RTX 8000 GPU. Conclusions: In conclusion, both AI models are promising. The fine-tuned SciFive model appears slightly more accurate and performs substantially faster than the GPT4 model.

Autoren

Institutionen

Health Outcomes Solutions (United States)(US)

Themen

Radiomics and Machine Learning in Medical ImagingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Screening oncology articles in a qualitative literature review using large language models: A comparison of GPT4 versus fine-tuned open source models using expert-annotated data.

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen