Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Leveraging Large Language Models for Post-Publication Peer Review: Potential and Limitations
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Peer review is the cornerstone of scientific evaluation, ensuring the quality, accuracy, and integrity of published research. However, challenges such as reviewer bias, time constraints, and the increasing volume of submissions have strained traditional peer review systems, resulting in delays, lower-quality reviews, and reviewer fatigue. These limitations highlight the need for innovative solutions. Large language models (LLMs) have emerged as promising tools to support or potentially replace certain aspects of peer review. This study investigates the potential of LLMs to enhance post-publication peer review, offering quality assessments and recommendations for published articles. Specifically, we designed two tasks to evaluate the performance of LLMs in postpublication research evaluation: identifying high-quality articles (Task 1) and providing ratings on recommended articles (Task 2). Six versions of three generative LLMs, including open-source models such as Qwen and Llama, the closed-source GPT-4o-mini model, and four BERT-based models, were assessed using in-context learning and fine-tuning approaches. The data for training and evaluation were sourced from H1 Connect (formerly Faculty Opinions), a platform for expert recommendations in the biomedical domain. Results indicate that fine-tuning LLMs with labelled data can significantly enhance their alignment with human expert evaluations. For Task 1, fine-tuned models performed well in identifying high-quality articles with an accuracy of 84%. However, for Task 2 - rating on recommended articles - LLMs struggled to match human judgement consistently with an accuracy below 0.6, highlighting their current limitations in nuanced, context-dependent tasks.
Ähnliche Arbeiten
New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)
2008 · 28.988 Zit.
TNM Classification of Malignant Tumours
1987 · 16.123 Zit.
A survey on deep learning in medical image analysis
2017 · 13.697 Zit.
Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening
2011 · 10.807 Zit.
The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM
2010 · 9.118 Zit.