Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
CHATGPT, MD: A PILOT STUDY UTILIZING LARGE LANGUAGE MODELS TO WRITE MEDICAL ABSTRACTS
1
Zitationen
11
Autoren
2024
Jahr
Abstract
Abstract Background Artificial intelligence (AI) has permeated academia, especially with the inception of Chat Generative Pretrained Transformer (ChatGPT), a large language model chatbot. The goal of this study was to evaluate ChatGPT’s ability to write and grade surgical research abstracts. Methods ChatGPT models 3.5 and 4.0 were trained to write ten different abstracts by providing it with ten previous abstracts presented at national meetings as models, background literature, and data with statistics. ChatGPT3.5’s and ChatGPT4.0’s versions were compared with junior resident’s and the senior author’s edited, submitted version for each data set. The 40 abstract versions were judged by experienced, blinded surgeon-reviewers, who graded them on a 10- and 20-point scale and ranked them 1st–4th. ChatGPT3.5 and ChatGPT4.0 also graded/ranked the abstracts. Standard statistics were performed. Results are reported in the order of resident, senior author, ChatGPT3.5, and ChatGPT4.0, respectively. Results The surgeon-reviewers could not differentiate between abstract authors; every reviewer ranked an AI-generated version 1st at least once. There was no difference in the surgeon-reviewers’ average 10-point scores (6.8 ± 1.5 vs. 6.9 ± 1.5 vs. 6.6 ± 1.4 vs. 6.9 ± 1.3; P = 0.611), 20-point scores (14.3 ± 3.0 vs. 14.6 ± 2.8 vs. 13.8 ± 2.7 vs. 14.3 ± 2.5; P = 0.500), or rank (2.5 ± 1.2 vs. 2.4 ± 1.1 vs. 2.8 ± 1.1 vs. 2.3 ± 1.1; P= 0.138). ChatGPT3.5’s grades were similar to those of the reviewers, but ChatGPT4.0 graded all four abstract versions higher on the 20-point scale than both the surgeon-reviewers (P = 0.023, P = 0.032, P = 0.002, P = 0.037) and ChatGPT3.5 (P = 0.003, P = 0.004, P = 0.003, P = 0.012). Conclusions When appropriately trained with background knowledge, data, and statistics, ChatGPT can write convincing medical research abstracts that are undifferentiable from resident’s or senior author’s drafts. ChatGPT3.5 graded abstracts similar to the surgeon-reviewers, while ChatGPT4.0 was more generous.
Ähnliche Arbeiten
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 82.806 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 62.803 Zit.
Measuring inconsistency in meta-analyses
2003 · 61.524 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement.
2009 · 45.121 Zit.
The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration
2009 · 27.678 Zit.