Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2
4
Zitationen
8
Autoren
2024
Jahr
Abstract
ABSTRACT Systematic reviews are essential for evidence based healthcare, but conducting them is time and resource consuming. To date, efforts have been made to accelerate and (semi-) automate various steps of systematic reviews through the use of artificial intelligence and the emergence of large language models (LLMs) promises further opportunities. One crucial but complex task within systematic review conduct is assessing the risk of bias of included studies. Therefore, the aim of this study was to test the LLM Claude 2 for risk of bias assessment of 100 randomized controlled trials using the revised Cochrane risk of bias tool (“RoB 2”; involving judgements for five specific domains and an overall judgement). We assessed the agreement of risk of bias judgements by Claude with human judgements published in Cochrane Reviews. The observed agreement between Claude and Cochrane authors ranged from 41% for the overall judgement to 71% for domain 4 (“outcome measurement”). Cohen’s κ was lowest for domain 5 (“selective reporting”; 0.10 (95% confidence interval (CI): −0.10-0.31)) and highest for domain 3 (“missing data”; 0.31 (95% CI: 0.10-0.52)), indicating slight to fair agreement. Fair agreement was found for the overall judgement (Cohen’s κ: 0.22 (95% CI: 0.06-0.38)). Sensitivity analyses using alternative prompting techniques or the more recent version Claude 3 did not result in substantial changes. Currently, Claude’s RoB 2 judgements cannot replace human risk of bias assessment. However, the potential of LLMs to support risk of bias assessment should be further explored.
Ähnliche Arbeiten
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
2021 · 84.856 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 82.787 Zit.
The Measurement of Observer Agreement for Categorical Data
1977 · 76.851 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 62.738 Zit.
Measuring inconsistency in meta-analyses
2003 · 61.458 Zit.