Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the potential risks of employing large language models in peer review
4
Zitationen
17
Autoren
2025
Jahr
Abstract
Abstract Objective This study aims to systematically investigate the potential harms of Large Language Models (LLMs) in the peer review process. Background LLMs are increasingly used in academic processes, including peer review. While they can address challenges like reviewer scarcity and review efficiency, concerns about fairness, transparency and potential biases in LLM‐generated reviews have not been thoroughly investigated. Methods Claude 2.0 was used to generate peer review reports, rejection recommendations, citation requests and refutations for 20 original, unmodified cancer biology manuscripts obtained from eLife 's new publishing model. Artificial intelligence (AI) detection tools (zeroGPT and GPTzero) assessed whether the reviews were identifiable as LLM‐generated.All LLM‐generated outputs were evaluated for reasonableness by two expert on a five‐point Likert scale. Results LLM‐generated reviews were somewhat consistent with human reviews but lacked depth, especially in detailed critique. The model proved highly proficient at generating convincing rejection comments and could create plausible citation requests, including requests for unrelated references. AI detectors struggled to identify LLM‐generated reviews, with 82.8% of responses classified as human‐written by GPTzero. Conclusions LLMs can be readily misused to undermine the peer review process by generating biased, manipulative, and difficult‐to‐detect content, posing a significant threat to academic integrity. Guidelines and detection tools are needed to ensure LLMs enhance rather than harm the peer review process.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.
Autoren
Institutionen
- Zhujiang Hospital(CN)
- Southern Medical University(CN)
- Nanjing Medical University(CN)
- Kangda College of Nanjing Medical University
- Shanghai First People's Hospital(CN)
- Shanghai Jiao Tong University(CN)
- TU Wien(AT)
- Chinese Academy of Medical Sciences & Peking Union Medical College(CN)
- Second Military Medical University(CN)
- Zhuhai People's Hospital(CN)
- Jinan University(CN)
- Nanfang Hospital(CN)
- The First Affiliated Hospital, Sun Yat-sen University(CN)
- Fudan University(CN)
- Zhongshan Hospital(CN)
- Sun Yat-sen University(CN)
- Wenzhou Medical University(CN)
- Quzhou University(CN)
- Quzhou City People's Hospital(CN)
- Affiliated Hospital of Qingdao University(CN)
- Qingdao University(CN)
- Central South University(CN)
- Xiangya Hospital Central South University(CN)
- Chinese University of Hong Kong(CN)
- University of Hong Kong(HK)