Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From Automation to Strategy: Managing AI Uncertainty in Meta-analysis with Context Engineering. (Preprint)
0
Zitationen
6
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> The application of Large Language Models (LLMs) to systematic reviews and meta-analyses promises to accelerate evidence synthesis. However, prior research has focused on automating discrete tasks, such as abstract screening. By concentrating primarily on performance metrics, this work has failed to validate a reliable end-to-end workflow. Their use introduces new forms of AI-enabled uncertainty, challenging traditional validation metrics and creating a need for new management strategies. </sec> <sec> <title>OBJECTIVE</title> This study aims to propose and validate a novel strategic framework, "Context Engineering," designed to navigate the uncertainties of LLM-driven research and manage an LLM to perform a reliable end- to-end meta-analysis. </sec> <sec> <title>METHODS</title> We benchmarked its performance by tasking it with replicating a previously published meta-analysis. We designed a five-layer (Instruction, Knowledge, Tool, History, Formatting) context engineering framework to enable ChatGPT-5 to perform the meta-analysis automation. This framework was designed to manage the LLM's workflow from literature search to statistical synthesis while ensuring methodological rigor. </sec> <sec> <title>RESULTS</title> The LLM pipeline included 19 final studies, demonstrating a low recall of 27.5% compared to the 40 studies in the reference meta-analysis. However, despite the divergent study cohorts, the pooled OR for non-advanced adenoma was nearly identical between the LLM pipeline and the original study (1.46 vs 1.45, respectively). This outcome, a high-fidelity result despite low screening recall, presents a novel finding in contrast to prior literature focused solely on screening performance. Critically, for advanced adenoma, the LLM produced a more conservative estimate (OR 1.70 vs 2.06), a finding consistent with the visibly greater symmetry of its corresponding funnel plot. This suggests a potentially lower risk of publication bias in the LLM-selected evidence base. Furthermore, the identification of 8 unique studies missed by the original review reinforces that, despite a lower recall, the pipeline's overall process led to a robust and accurate final synthesis. </sec> <sec> <title>CONCLUSIONS</title> The strategic implication of this study is that by managing LLMs with a structured framework like context engineering, their inherent uncertainties can be navigated to produce reliable and potentially more robust final results. Our work is the first to demonstrate that with a structured approach, an LLM can function as an independent research agent capable of producing these trustworthy outcomes, moving beyond its role as a simple assistant tool. This methodology has the Potential to accelerate the pace of evidence synthesis in medicine. </sec> <sec> <title>CLINICALTRIAL</title> Not Applicable (This study was a methdologycal evaluation designed to replicate a previously published meta-analysis (PROSPERO:CRD42022308533) using a large language model.) </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.