Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Invisible Text Injection: The Trojan Horse of AI-Assisted Medical Peer Review
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Abstract Key Points Question Are large language models robust against adversarial attacks in medical peer review? Findings In this factorial experimental study, invisible text injection attacks significantly increased review scores and raised manuscript acceptance rates from 0% to nearly 100%, while also significantly impairing the ability of large language models to detect scientific flaws. Meaning Enhanced safeguards and human oversight are essential prerequisite for using large language models in medical peer review. Importance Large language models (LLMs) are increasingly considered for medical peer review. However, their vulnerability to adversarial attacks and ability to detect scientific flaws remain poorly understood. Objective Evaluate LLMs’ ability to identify scientific flaws in peer review and their robustness against invisible text injection (ITI). Design, Setting, and Participants This factorial experimental study was conducted in May 2025 using a 3 LLMs × 3 prompt strategies × 4 manuscript variants x 2 with/without ITI design. We used three commercial LLMs (Anthropic, Google, OpenAI). The four manuscript variants either contained no flaws (control) or included scientific flaws in the methodology, results, or discussion section, respectively. Three prompt strategies were evaluated: neutral peer review, strict guidelines emphasizing objectivity, and explicit rejection. Interventions ITI involved inserting concealed instructions using white text on white background, directing LLMs to review with positive evaluations and “accept without revision” recommendations. Main Outcomes and Measures Primary outcomes were review scores (1-5 scale) and acceptance rates under neutral prompts. Secondary outcomes were review scores, acceptance rates under strict and explicit reject prompts. We investigated flaw detection capability using liberal (detect any flaw) and stringent (detect all flaw) criteria. We calculated mean score differences by models and prompt types and used t-test and Fisher’s exact test for calculating P-value. Results ITI caused significant score inflation under neutral prompts. Score differences for Anthropic, Google and OpenAI were 1.0 (P<.001), 2.5 (P<.001) and 1.7 (P<.001). Acceptance rates increased from 0% to 99.2%-100% across all providers (P<.001). Score differences were still statistically significant under strict prompting. Score differences were not significant under explicit rejection prompting, but flaw detection rate was still impaired. Using liberal detection criteria, results section flaw detection rate was significantly compromised with ITI, particularly in Google (88.9% to 47.8%, P<.001). Stringent criteria revealed methodology detection falling from 56.3% to 25.6% (P<.001) and overall detection dropping from 18.9% to 8.5% (P<.001). Conclusions and Relevance ITI can significantly alter the evaluation of medical studies by LLMs, and mitigation at the prompt level is insufficient. Enhanced safeguards and human oversight are essential prerequisites for the application of LLMs in medical publishing.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.
Autoren
Institutionen
- Jeju National University Hospital(KR)
- Ajou University(KR)
- University of Ulsan(KR)
- Asan Medical Center(KR)
- Ulsan College(KR)
- Artificial Intelligence in Medicine (Canada)(CA)
- Seoul National University Hospital(KR)
- Institute for National Security Strategy(KR)
- Shinhwa Medical (South Korea)(KR)
- Inha University(KR)