OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 07:03

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

CHATGPT, MD: A PILOT STUDY UTILIZING LARGE LANGUAGE MODELS TO WRITE MEDICAL ABSTRACTS

2024·1 Zitationen·British journal of surgeryOpen Access
Volltext beim Verlag öffnen

1

Zitationen

11

Autoren

2024

Jahr

Abstract

Abstract Background Artificial intelligence (AI) has permeated academia, especially with the inception of Chat Generative Pretrained Transformer (ChatGPT), a large language model chatbot. The goal of this study was to evaluate ChatGPT’s ability to write and grade surgical research abstracts. Methods ChatGPT models 3.5 and 4.0 were trained to write ten different abstracts by providing it with ten previous abstracts presented at national meetings as models, background literature, and data with statistics. ChatGPT3.5’s and ChatGPT4.0’s versions were compared with junior resident’s and the senior author’s edited, submitted version for each data set. The 40 abstract versions were judged by experienced, blinded surgeon-reviewers, who graded them on a 10- and 20-point scale and ranked them 1st–4th. ChatGPT3.5 and ChatGPT4.0 also graded/ranked the abstracts. Standard statistics were performed. Results are reported in the order of resident, senior author, ChatGPT3.5, and ChatGPT4.0, respectively. Results The surgeon-reviewers could not differentiate between abstract authors; every reviewer ranked an AI-generated version 1st at least once. There was no difference in the surgeon-reviewers’ average 10-point scores (6.8 ± 1.5 vs. 6.9 ± 1.5 vs. 6.6 ± 1.4 vs. 6.9 ± 1.3; P = 0.611), 20-point scores (14.3 ± 3.0 vs. 14.6 ± 2.8 vs. 13.8 ± 2.7 vs. 14.3 ± 2.5; P = 0.500), or rank (2.5 ± 1.2 vs. 2.4 ± 1.1 vs. 2.8 ± 1.1 vs. 2.3 ± 1.1; P= 0.138). ChatGPT3.5’s grades were similar to those of the reviewers, but ChatGPT4.0 graded all four abstract versions higher on the 20-point scale than both the surgeon-reviewers (P = 0.023, P = 0.032, P = 0.002, P = 0.037) and ChatGPT3.5 (P = 0.003, P = 0.004, P = 0.003, P = 0.012). Conclusions When appropriately trained with background knowledge, data, and statistics, ChatGPT can write convincing medical research abstracts that are undifferentiable from resident’s or senior author’s drafts. ChatGPT3.5 graded abstracts similar to the surgeon-reviewers, while ChatGPT4.0 was more generous.

Ähnliche Arbeiten