OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.05.2026, 04:07

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Readability, Accuracy, and Completeness of Urogynecology Health Information Provided Through Patient-Driven Interaction With an Artificial Intelligence Chatbot

2026·0 Zitationen·Obstetrics and Gynecology
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

INTRODUCTION: The readability, accuracy, and completeness of urogynecology health information provided by large language models (LLMs) has been found to be comparable to currently available education materials. However, LLM responses to patient-initiated questions have not yet been examined. OBJECTIVE: To investigate the accuracy, completeness, and readability of information provided to patients by an LLM in response to patient questions at the time of the first urogynecology consultation. METHODS: This is a secondary analysis of a randomized trial examining use of an LLM, Chat Generative Pre-trained Transformer (ChatGPT 4o; OpenAI) by patients at their initial urogynecology visit. Participants were recruited if they presented with prolapse, incontinence, or lower urinary tract symptoms (LUTS) and were English- or Spanish-speaking. They were randomized into one of three arms: use of ChatGPT prior to their appointment (Arm 1), use of ChatGPT following their appointment (Arm 2), or no use of an LLM (Arm 3). Participants in Arms 1 and 2 were provided ChatGPT on a tablet, and instructed to ask (typing or speaking) the program anything they’d like about their most important urogynecology problem, with up to five follow-up questions. English and Spanish ChatGPT transcripts were analyzed by two independent physician reviewers for accuracy, completeness, actionability/understandability [Patient Education Materials Assessment Tool (PEMAT)], and readability [Simple Measure of Gobbledygook (SMOG), Spanish Orthographic Length (SOL), Fry Readability Graph, Flesch-Kincaid (FK) Grade Level, Fernández Huerta (FH) Readability Index]. Significant results were declared at P<0.05. RESULTS: 125 patients were randomized from July to December, 2024, and 79 conversation transcripts were collected (41 Arm 1, 38 Arm 2). Average participant age was 58.4±13.8 years, and the majority of participants identified as non-Hispanic White (77%). 94% of participants had at least a high school education, and median health literacy measured by Short Assessment of Health Literacy (SAHL) score was 17 (out of 18). 91% of participants communicated with the chatbot in English (Table 1). Reviewers found the majority of chatbot responses were equally as accurate or complete as information that would be provided by a urogynecology specialist, and PEMAT scores indicated moderate understandability and low actionability (Table 2). Averaged readability scores of the chatbot responses were 12.2±2.3 for Arm 1 and 11.4±2.6 for Arm 2 (Figure 1). Readability scores did not significantly correlate with participants’ health literacy (SAHL) (r=0.086, p=0.450). Moderate or good agreement (kappa, κ) was noted between urogynecologist-documented primary diagnosis and themes patient discussed with chatbot for Arm 1 (prolapse κ=0.739, P<0.001; incontinence κ=0.564, P<0.001; LUTS κ=0.663, P<0.001) and Arm 2 (prolapse κ=0.713, P<0.001; incontinence κ=0.653, P<0.001; LUTS κ=0.418, P=0.011). CONCLUSIONS: ChatGPT responses to patient questions were overall as accurate and complete as those provided by urogynecologists and were easy to understand, but showed poor actionability. Responses also had higher-than-recommended readability levels for patient education materials (above the 5th to 6th-grade level). Unlike printed materials, however, the AI chatbot can simplify responses on request. Future studies may focus on the variability in accuracy of such simplified information when using LLMs for patient education.Figure 1Table 1Table 2

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsDigital Mental Health Interventions
Volltext beim Verlag öffnen