Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
GPT for RCTs?: Using AI to determine adherence to reporting guidelines
8
Zitationen
5
Autoren
2023
Jahr
Abstract
Abstract Background Adherence to established reporting guidelines can improve clinical trial reporting standards, but attempts to improve adherence have produced mixed results. This exploratory study aimed to determine how accurate a Large Language Model generative AI system (AI-LLM) was for determining reporting guideline compliance in a sample of sports medicine clinical trial reports. Design and Methods This study was an exploratory retrospective data analysis. The OpenAI GPT-4 and Meta LLama2 AI-LLMa were evaluated for their ability to determine reporting guideline adherence in a sample of 113 published sports medicine and exercise science clinical trial reports. For each paper, the GPT-4-Turbo and Llama 2 70B models were prompted to answer a series of nine reporting guideline questions about the text of the article. The GPT-4-Vision model was prompted to answer two additional reporting guideline questions about the participant flow diagram in a subset of articles. The dataset was randomly split (80/20) into a TRAIN and TEST dataset. Hyperparameter and fine-tuning were performed using the TRAIN dataset. The Llama2 model was fine-tuned using the data from the GPT-4-Turbo analysis of the TRAIN dataset. Primary outcome measure: Model performance (F1-score, classification accuracy) was assessed using the TEST dataset. Results Across all questions about the article text, the GPT-4-Turbo AI-LLM demonstrated acceptable performance (F1-score = 0.89, accuracy[95% CI] = 90%[85-94%]). Accuracy for all reporting guidelines was > 80%. The Llama2 model accuracy was initially poor (F1-score = 0.63, accuracy[95%CI] = 64%[57-71%]), and improved with fine-tuning (F1-score = 0.84, accuracy[95%CI] = 83%[77-88%]). The GPT-4-Vision model accurately identified all participant flow diagrams (accuracy[95% CI] = 100%[89-100%]) but was less accurate at identifying when details were missing from the flow diagram (accuracy[95% CI] = 57%[39-73%]). Conclusions Both the GPT-4 and fine-tuned Llama2 AI-LLMs showed promise as tools for assessing reporting guideline compliance. Next steps should include developing an efficent, open-source AI-LLM and exploring methods to improve model accuracy.
Ähnliche Arbeiten
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
2021 · 89.315 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 83.027 Zit.
The Measurement of Observer Agreement for Categorical Data
1977 · 77.767 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 63.381 Zit.
Measuring inconsistency in meta-analyses
2003 · 62.051 Zit.