Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
EXAMINING THE ROLE OF CHATGPT IN THE EVALUATION OF SCIENTIFIC ABSTRACTS
0
Zitationen
9
Autoren
2024
Jahr
Abstract
Abstract Background Reviewing abstracts for inclusion in scientific conferences involves substantial time and effort. Recent innovations in Large Language Model (LLM)s, such as OpenAI’s ChatGPT, have the potential to reduce the burden on reviewers. This study aimed to test that potential by programmatically grading abstracts using custom-trained LLMs. Methods Abstracts and reviewer grades of American Hernia Society (AHS – 2021) and American Plastic Surgery Association (ASPS 2021–2022) abstracts were obtained. Two new models of the state-of-the-art ChatGPT-3.5-turbo-1106 were fine-tuned on corpuses of these abstracts and grades. We trained for three epochs with a standard loss function and four inputs: a “system” directive, the grading rubric, the abstract, and the average reviewer grade on a five-point scale. The models were evaluated with the 2023 abstracts and grades from both societies. The trained models graded the abstracts and ranked them using three custom algorithms: QuickSort, which iteratively compared two abstracts at a time; Quartile, which ranked four; and Bridging, which combined batches of fifteen. Mean differences and Spearman correlation coefficients were calculated between actual and predicted grades/rankings. Results The trained models successfully followed the 2023 rubrics and produced in less than 20 minutes predicted grades centered near the actual: the mean difference in groups was 0.0719 ± 0.881 for AHS and 0.104 ± 0.944 for ASPS. However, variance was high and the predicted rankings had at best weak rank correlations with the actual rankings. Conclusion Custom-trained LLMs present a novel method for evaluating abstracts efficiently, but further research is needed to fine-tune the models for more precise results.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.380 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.243 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.671 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.496 Zit.