Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT Compared With Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions? [ID 2683501]

2024·0 Zitationen·Obstetrics and Gynecology

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

INTRODUCTION: Large language models (LLMs) have proven themselves in a variety of settings and shown the ability to provide information on a variety of topics in obstetrics and gynecology. Which platform performs best in regard to its ability to respond to commonly asked pregnancy questions is unknown. METHODS: A qualitative analysis of ChatGPT and Google Bard was performed in August of 2023. We queried each LLM on 12 commonly asked pregnancy questions and asked for their references. Query responses were graded as “acceptable” or “not acceptable” based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as “verified,” “broken,” “irrelevant,” “non-existent,” and “no references.” Review and grading of the responses and references for both LLMs were performed by the co-authors individually and then as a group to formulate a consensus. RESULTS: A grade of acceptable was given to 58% of ChatGPT’s responses (7 out of 12) and 83% of Google Bard’s responses (10 out of 12). A grade of not acceptable was assigned to 42% of ChatGPT’s responses (2 were incomplete and 3 were incorrect) and 17% of Google Bard’s responses (2 incomplete). In regard to references, ChatGPT had reference issues in 100% of its references (12 out of 12). No references were provided for 10 answers, one answer had a non-existent reference, and another answer had two broken references and two verified references. Google Bard had discrepancies in 8.3% of its references (1 out of 12): 11 answers had verified references, and 1 answer had a broken reference. CONCLUSION: Google Bard has superior response performance regarding content and references when queried on commonly asked pregnancy questions. Both LLMs must be carefully evaluated and vetted prior to being accepted as accurate and reliable for this purpose.

Autoren

Institutionen

Jersey Shore University Medical Center(US)

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

ChatGPT Compared With Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions? [ID 2683501]

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen