OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 07.05.2026, 13:25

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts

2024·41 Zitationen·The Annals of Family MedicineOpen Access
Volltext beim Verlag öffnen

41

Zitationen

8

Autoren

2024

Jahr

Abstract

PURPOSE: Worldwide clinical knowledge is expanding rapidly, but physicians have sparse time to review scientific literature. Large language models (eg, Chat Generative Pretrained Transformer [ChatGPT]), might help summarize and prioritize research articles to review. However, large language models sometimes "hallucinate" incorrect information. METHODS: We evaluated ChatGPT's ability to summarize 140 peer-reviewed abstracts from 14 journals. Physicians rated the quality, accuracy, and bias of the ChatGPT summaries. We also compared human ratings of relevance to various areas of medicine to ChatGPT relevance ratings. RESULTS: ChatGPT produced summaries that were 70% shorter (mean abstract length of 2,438 characters decreased to 739 characters). Summaries were nevertheless rated as high quality (median score 90, interquartile range [IQR] 87.0-92.5; scale 0-100), high accuracy (median 92.5, IQR 89.0-95.0), and low bias (median 0, IQR 0-7.5). Serious inaccuracies and hallucinations were uncommon. Classification of the relevance of entire journals to various fields of medicine closely mirrored physician classifications (nonlinear standard error of the regression [SER] 8.6 on a scale of 0-100). However, relevance classification for individual articles was much more modest (SER 22.3). CONCLUSIONS: Summaries generated by ChatGPT were 70% shorter than mean abstract length and were characterized by high quality, high accuracy, and low bias. Conversely, ChatGPT had modest ability to classify the relevance of articles to medical specialties. We suggest that ChatGPT can help family physicians accelerate review of the scientific literature and have developed software (pyJournalWatch) to support this application. Life-critical medical decisions should remain based on full, critical, and thoughtful evaluation of the full text of research articles in context with clinical guidelines.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMeta-analysis and systematic reviewsMisinformation and Its Impacts
Volltext beim Verlag öffnen