Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Assessment of Large Language Model Outputs and NHS Patient Information in Oral Medicine

2025·1 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background Artificial intelligence (AI) and large language models (LLMs) offer transformative potential in healthcare communication, with the National Health Service (NHS) Long Term Plan envisioning digital tools to support accessible, patient-centred information. However, whether LLM-generated health materials are sufficiently readable for patient use remains uncertain, particularly in oral medicine, where conditions like xerostomia, oral candidiasis, and sialolithiasis are common. Objective This study compared the readability of patient information leaflets generated by three LLMs with the NHS UK patient leaflets on common oral medicine conditions, to assess their suitability for public health communication. Methods A cross-sectional analysis was conducted, in which each LLM was prompted to produce patient leaflets for xerostomia, oral candidiasis, and sialolithiasis. Outputs were compared to the NHS UK leaflets on identical topics. Texts were analysed using established readability metrics: Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), and Gunning Fog Index. Results were summarised descriptively without formal statistical testing due to the exploratory study design. Results The NHS UK leaflets consistently demonstrated superior readability across all conditions and metrics, with lower FKGL scores (5.9-6.3) and higher FRES scores (70.5-72.4), indicating suitability for readers aged 11-14 years (Key Stage 3). Among LLMs, ChatGPT produced the most readable outputs, with FKGL scores ranging from 6.8 to 7.2. DeepSeek outputs were moderately more complex (FKGL: 8.3-8.7), while Gemini generated the most complex texts (FKGL: 9.7-10.2), often exceeding recommended reading levels for patient materials. Conclusion While LLMs, especially ChatGPT, show promise in generating patient information, their outputs remain less readable than professionally authored NHS materials. Given that nearly half of UK adults may struggle with complex health texts, the higher reading levels required for LLM-generated content could impede patient understanding and exacerbate health inequalities. As AI becomes more integrated into healthcare communication, ensuring that AI-generated materials meet established readability standards is essential to support equitable, patient-centred care.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationInterpreting and Communication in HealthcareHealth Literacy and Information Accessibility

Volltext beim Verlag öffnen

Comparative Assessment of Large Language Model Outputs and NHS Patient Information in Oral Medicine

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen