Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Machine Learning Tools To (Semi-) Automate Evidence Synthesis

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Introduction. Tools that leverage machine learning, a subset of artificial intelligence, are becoming increasingly important for conducting evidence synthesis as the volume and complexity of primary literature expands exponentially. In response, we have created a living rapid review and evidence map to understand existing research and identify available tools. Methods. We searched PubMed, Embase, and the ACM Digital Library from January 1, 2021, to April 3, 2024, for comparative studies, and identified older studies using the reference lists of existing evidence synthesis products (ESPs). We plan to update searches every 6 months. We included evaluations of machine learning or artificial intelligence tools to automate or semi-automate any stage of systematic review production. Two reviewers conducted title and abstract screening independently, with disagreements resolved through discussion or adjudication by a third reviewer. A single reviewer performed full-text screening and data extraction. We did not assess the quality of individual studies or the strength of evidence across studies. Extracted data included key characteristics of the tools (e.g., type of automation method, systematic review tasks automated), evaluation methods, and performance results (e.g., recall, measures of workload, accuracy, and the authors’ conclusions). The protocol was prospectively registered on the AHRQ website (https://effectivehealthcare.ahrq.gov/products/tools/protocol). Results. We included 56 studies, which evaluated the performance of tools primarily relative to standard human processes across various systematic review tasks. For search-related tools (7 studies), recall (the percent of relevant citations correctly identified) ranged from 0 to 97 percent (median 26%) compared to human-developed search strategies, while precision (the percent of identified citations that are relevant) ranged from 0 to 13.4 percent (median 4.3%). Tools designed to identify randomized controlled trials (RCTs) (6 studies) had recalls between 96 and 100 percent (median 98.5%), with precision ranging from 8 to 92 percent (median 44%), compared to either manual identification or PubMed’s “publication type” tags. Abstract screening tools (22 studies) had a median recall of 93 percent (range 1–100%) with human screening as the standard, while median burden reduction was 50 percent (range 1–93%), and median work saved over sampling to achieve 95 percent recall (WSS@95) was 54 percent (range 33–90%). Data extraction tools (9 studies) showed highly variable performance, with the percentage of data correctly extracted compared to manual extraction ranging from 0 to 99 percent (median 10%). Finally, tools used for risk of bias assessment (7 studies) showed modest agreement with human reviewers, with Cohen’s weighted kappa ranging from 0.11 to 0.48 (median 0.16). Discussion. Certain tools, particularly those for automatically identifying RCTs and prioritizing relevant abstracts in screening, show a high level of recall and precision, suggesting they are nearing widespread use with human oversight. However, other tools, such as those for searching and data extraction, show highly variable performance and are not yet reliable enough for semi-automation. This work revealed the importance of developing standardized evaluation frameworks for assessing the performance of machine learning and artificial intelligence tools in systematic review tasks. We did not assess the risk of bias or methodological quality of the included studies, which may affect the reliability and comparability of the reported performance outcomes. Additionally, the tools were evaluated in a variety of settings, tasks, and review questions, which introduces heterogeneity that makes direct comparisons across tools challenging. Lastly, the rapidly evolving nature of machine learning technologies means that our findings may quickly become outdated. Therefore, we have planned ongoing updates every 6 months.

Autoren

Themen

Meta-analysis and systematic reviewsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Machine Learning Tools To (Semi-) Automate Evidence Synthesis

Abstract

Ähnliche Arbeiten

Autoren

Themen