OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.05.2026, 02:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Prompt Engineering For AI-Assisted Systematic Review In Plastic Surgery

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2026

Jahr

Abstract

Purpose: Artificial intelligence (AI) models may improve efficiency of data-driven tasks, including stages of conducting a systematic review, such as literature search and full-text analysis. These large language models (LLMs) can quickly derive specific information from broad sources, and by design, may simplify and expedite the process of performing a traditional meta-analysis. There are few reports of AI utilization for meta-analysis tasks, and none that investigate AIs ability to accurately reproduce meta-analysis-based research tasks from start to finish. 1 We aimed to evaluate prompt engineering performance in LLM-assisted literature searches, benchmarked against highly-cited meta-analyses from Plastic and Reconstructive Surgery . Methods: Two highly cited meta-analyses were selected from the journal of Plastic and Reconstructive Surgery . Both works were chosen because they reported (1) all Medical Subject Headings (MeSH) terms and PICOs (population, intervention, comparison, outcomes framework) and (2) the total number of titles, abstracts, and full texts reviewed versus finally included were stated for reference. For each meta-analysis, multiple LLMsChatGPT (4, 4o, 5), LLAMA, Gemini, Perplexity, Claude and Deepseekwere given the specific research question posited in each study and prompted to perform each meta-analysis task by different prompt engineering strategies. The step-wise generated content, including initially-identified studies, specific titles and abstracts, and final study inclusion, was benchmarked against the same human-performed meta-analyses. Sensitivity was defined as the percentage of human-gathered studies that the AI gathered. Specific barriers to AI accuracy were identified. Results: With zero-shot learning prompt engineering methods (i.e. simply asking AI to perform a task without giving additional context or examples), the sensitivity of initial literature screening tasks was abysmal across all LLMs. Second-stage prompt engineering involved output-feedback prompt tuning (i.e. iteratively refining the prompt with clarifications, additional context, examples, etc.), and sensitivity ranged from 28.2% with Chat-GPT 4 to 75.9% with Gemini, and from 10.2% with Chat-GPT 4 to 38.6% with Deepseek for manuscripts 1 and 2, respectively. Final-stage prompt engineering was performed using recursive prompt design and template-based prompt design, with a final best sensitivity of 72.7% with Gemini for manuscript 1 (see figure 1). Conclusion: Though the prescribed data mining and processing used in systematic review lend themselves to AI platform design, performance is widely divergent from human processing and analysis and greatly dependent on prompt engineering methods. Important potential benefits of AI use in meta-analyses would be unlimited language comprehension and, as others have shown, perfect pooling of raw data. 1 However, multiple barriers to achieving 100% accuracy remain, including overcoming pay-walls, variability in response to prompts, and appreciating nuance in medical manuscript content without expert-assisted retrieval-augmented generation (RAG), for example. While this technology is advancing rapidly, current publicly-available LLM platforms are not yet able to perform the steps of systematic review in plastic surgery reliably or accurately.© 2026. Plastic Surgery Research Council | All rights reserved |*Source: https://ps-rc.org/meeting/Program/2026/AS45.cgi*

Ähnliche Arbeiten