OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 24.04.2026, 23:44

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of AI-Generated Personalized Patient Education Handouts for Stress Urinary Incontinence: Readability, Quality, and Actionability

2026·0 Zitationen·Obstetrics and Gynecology
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

INTRODUCTION: Stress urinary incontinence (SUI) is a prevalent condition that impacts quality of life and treatment decisions. Patient education is central for informed treatment choices, yet most written educational resources use language above recommended literacy levels, limiting accessibility. Large language models (LLMs) offer a novel opportunity to generate tailored, accessible education. We evaluated the clarity, quality, and clinical utility of artificial intelligence (AI)-generated patient education materials for SUI. OBJECTIVE: To evaluate whether LLMs can generate patient education handouts tailored to individual medical histories that are more understandable and accessible than standard patient education materials. METHODS: Five standardized SUI patient profiles were entered into five LLMs (ChatGPT-4o, Claude Sonnet 4, Gemini 2.5 Flash, Grok 3, DeepSeek-V3). Each generated handouts without instruction regarding literacy level (baseline) and with instruction to write at a sixth-grade reading level. Handouts were evaluated using PEMAT (understandability, actionability), DISCERN (information quality), automated readability indices (Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, SMOG Index, Coleman-Liau Index), and structured expert review (accuracy, safety, appropriateness, actionability, effectiveness). Of note, lower scores indicate better readability in all readability indices except Flesch Reading Ease. International Urogynecological Association (IUGA) handouts served as comparator. Bonferroni-adjusted post hoc tests determined statistical significance on IBM SPSS Statistics software version 31. RESULTS: IUGA handouts had the highest PEMAT understandability (85.0%). Each baseline LLM scored lower than IUGA (ChatGPT 69.23%, DeepSeek 72.31%, Grok 70.77%, all p<0.001), but sixth-grade prompts improved scores. None of the sixth-grade models were statistically different than IUGA. Actionability scored ∼60% across all groups, including IUGA. DISCERN scores were highest for IUGA (3.69) and significantly lower for all LLMs (2.84–3.24, all p<0.001). Readability improved significantly with sixth-grade prompts across multiple indices when compared to IUGA (all p<0.001). Experts rated IUGA highest (24.3/25), with LLMs scoring 19.6–21.2; no significant difference was found between IUGA and Claude-6, DeepSeek-6, or Grok-6. Experts rated all baseline level LLMs, ChatGPT-6 and Gemini-6 significantly worse than IUGA (p<0.05). See Table 1. CONCLUSIONS: Baseline LLMs underperformed IUGA in understandability and information quality, but sixth-grade prompts eliminated the understandability gap and improved readability across multiple indices. Despite comparable readability and safety, AI-generated outputs remained lower in information quality, underscoring the need for human oversight to ensure clinical accuracy and completeness.Table 1

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Pelvic floor disorders treatmentsArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare
Volltext beim Verlag öffnen