Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ReviewGuard: Enhancing Deficient Peer Review Detection via LLM-Driven Data Augmentation
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Peer review serves as the gatekeeper of science, yet the surge in submissions and widespread adoption of large language models (LLMs) in scholarly evaluation present unprecedented challenges. While recent work has focused on using LLMs to improve review efficiency, unchecked deficient reviews from both human experts and AI systems threaten to systematically undermine academic integrity. To address this issue, we introduce ReviewGuard, an automated system for detecting and categorizing deficient reviews through a four-stage LLM-driven framework: data collection from ICLR and NeurIPS on OpenReview, GPT-4.1 annotation with human validation, synthetic data augmentation yielding $\mathbf{6, 6 3 4}$ papers with 24,657 real and 46,438 synthetic reviews, and fine-tuning of encoderbased models and open-source LLMs. Feature analysis reveals that deficient reviews exhibit lower rating scores, higher self-reported confidence, reduced structural complexity, and more negative sentiment than sufficient reviews. AI-generated text detection shows dramatic increases in AI-authored reviews since ChatGPT’s emergence. Mixed training with synthetic and real data substantially improves detection performance-for example, Qwen 3-8B achieves recall of 0.6653 and F1 of 0.7073, up from 0.5499 and 0.5606 respectively. This study presents the first LLMdriven system for detecting deficient peer reviews, providing evidence to inform AI governance in peer review. Code, prompts, and data are available at GitHub Repository.
Ähnliche Arbeiten
International Journal of Scientific and Research Publications
2022 · 2.691 Zit.
Student writing in higher education: An academic literacies approach
1998 · 2.518 Zit.
Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling
2012 · 2.320 Zit.
How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data
2009 · 1.926 Zit.
Chatting and cheating: Ensuring academic integrity in the era of ChatGPT
2023 · 1.880 Zit.