OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.03.2026, 15:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of ChatGPT and Gemini in Answering Patient Questions After Gynecologic Surgery

2025·1 Zitationen·Obstetrics and Gynecology
Volltext beim Verlag öffnen

1

Zitationen

6

Autoren

2025

Jahr

Abstract

INTRODUCTION: Clinics often face challenges in promptly responding to patient health questions, and patients are increasingly utilizing internet resources to answer questions about their health. Artificial intelligence large language learning models can possibly assist in triaging gynecologic questions. ChatGPT can answer clinical queries with preliminary information across a spectrum of obstetrics and gynecology topics, despite limitations in consistency and depth of insight (Grunebaum et al, 2023). OBJECTIVE: To explore the performance of ChatGPT version 4.0 (GPT-4) and Gemini Advanced (Gemini) in addressing common patient questions after gynecology surgery with regard to accuracy, relevance, helpfulness to the average patient, and readability. METHODS: Postoperative patient questions were developed to simulate common patient questions after gynecologic surgery, based on expert opinion and compiled from anonymous posters on Reddit (r/endometriosis). A total of 41 questions were focused on five key areas: vaginal bleeding, bowel/bladder function, incision care, resumption of activities, and sexual function. Questions were asked in a systematic 3-step submission process with the memory reset after each query. Output of prompted questions was independently assessed for accuracy and relevance by four board-certified gynecologic surgeons with fellowship training in gynecologic surgery. Responses were graded on a 5-point Likert scale. Response consistency was assessed by labeling the set of answers as either “consistent” or “inconsistent.” Readability of the answers was assessed with the Flesch Kincaid grade level (FKGL) calculator. Responses were also assessed by three clinic nurses who commonly answer patient questions via MyChart for helpfulness to the average patient. RESULTS: 41 questions were posed to GPT-4 and Gemini three times, resulting in a total of 246 individual responses. These responses were independently evaluated by four board-certified minimally invasive gynecologic surgeons and three clinic nurses, leading to a total of 1,968 evaluations for accuracy, relevance, helpfulness to the average patient, and readability. Surgeons and nurses graded Gemini responses as more accurate (4.23 vs 4.03, p=0.015) and helpful (4.37 vs 4.21, p=0.025) than GPT-4 responses. Responses from both models were similarly found to be relevant or very relevant (4.45 vs 4.36, p=0.2). Most responses by GPT-4 (85%) and Gemini (87%) were noted to be consistent across all questions. While the 8th-grade reading level is generally recommended for patient literature, the average FKGL grade levels for GPT-4 and Gemini responses were 11th grade and 10th grade, respectively. The average response length was shorter for Gemini (10 sentences) than for GPT-4 (15 sentences). CONCLUSIONS: GPT-4 and Gemini demonstrate the potential to respond accurately, relevantly, and consistently to patient questions regarding postoperative gynecologic care. Gemini performed better than GPT-4 at response accuracy and relevance, but responses from both models were graded as accurate/highly accurate by surgeon reviewers. The readability of responses exceeded the recommended 8th-grade reading level for patient literature. Collectively, these results indicate that large language models may aid patients and clinical staff in answering postoperative queries; however, they are not a substitute for professional gynecologic advice.

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationCardiac, Anesthesia and Surgical Outcomes
Volltext beim Verlag öffnen