Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Current capabilities of large language models as peer reviewers for manuscripts submitted to ophthalmology-related journals
0
Zitationen
7
Autoren
2026
Jahr
Abstract
To compare large language models (LLMs) and human reviewers in the peer review process of manuscripts submitted to 3 ophthalmology-related journals. This retrospective study comprised 300 randomly selected manuscripts from 3 anonymized journals under 1 editor between June 2023 and July 2024. Comments from 2 LLMs (Chat Generative Pre-Trained Transformer [ChatGPT] 4o and Gemini) and human reviewers (324 ophthalmologists) were compared. LLMs were prompted to accept, accept with major or minor revisions, or reject each manuscript in addition to providing comments. A 5-point Likert scale was used to assess the "favorability" of comments and compare manuscripts that were accepted or rejected by the editor. A 4-category quality assessment was used to compare the number of comments, detail/specificity, critical analysis, and literature support. Human reviewers rejected manuscripts more frequently (73.33% vs 2.00% ChatGPT and 2.00% Gemini; P < .001) and suggested major (22.67% vs 68.00% ChatGPT and 31.33% Gemini; P < .001) or minor revisions (3.33% vs 30.00% ChatGPT and 66.33% Gemini; P < .001) less often. Human reviewers gave more negative feedback for rejected manuscripts (-1.05 vs -0.02 ChatGPT and 0.24 Gemini; P < .015). ChatGPT repeated "novelty," "sample size," and "clarity" in 75%, 60%, and 50% of cases, respectively, while Gemini did so in 80%, 70%, and 65% of cases. Both lacked specificity, omitting line numbers and references. Although it is hoped that LLMs will one day be able to augment the role of peer reviewers, in their current state, LLMs should not be used for manuscript revision.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.496 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.386 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.848 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.562 Zit.