Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Assessment of Large Language Model Outputs and NHS Patient Information in Oral Medicine
1
Zitationen
4
Autoren
2025
Jahr
Abstract
Background Artificial intelligence (AI) and large language models (LLMs) offer transformative potential in healthcare communication, with the National Health Service (NHS) Long Term Plan envisioning digital tools to support accessible, patient-centred information. However, whether LLM-generated health materials are sufficiently readable for patient use remains uncertain, particularly in oral medicine, where conditions like xerostomia, oral candidiasis, and sialolithiasis are common. Objective This study compared the readability of patient information leaflets generated by three LLMs with the NHS UK patient leaflets on common oral medicine conditions, to assess their suitability for public health communication. Methods A cross-sectional analysis was conducted, in which each LLM was prompted to produce patient leaflets for xerostomia, oral candidiasis, and sialolithiasis. Outputs were compared to the NHS UK leaflets on identical topics. Texts were analysed using established readability metrics: Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), and Gunning Fog Index. Results were summarised descriptively without formal statistical testing due to the exploratory study design. Results The NHS UK leaflets consistently demonstrated superior readability across all conditions and metrics, with lower FKGL scores (5.9-6.3) and higher FRES scores (70.5-72.4), indicating suitability for readers aged 11-14 years (Key Stage 3). Among LLMs, ChatGPT produced the most readable outputs, with FKGL scores ranging from 6.8 to 7.2. DeepSeek outputs were moderately more complex (FKGL: 8.3-8.7), while Gemini generated the most complex texts (FKGL: 9.7-10.2), often exceeding recommended reading levels for patient materials. Conclusion While LLMs, especially ChatGPT, show promise in generating patient information, their outputs remain less readable than professionally authored NHS materials. Given that nearly half of UK adults may struggle with complex health texts, the higher reading levels required for LLM-generated content could impede patient understanding and exacerbate health inequalities. As AI becomes more integrated into healthcare communication, ensuring that AI-generated materials meet established readability standards is essential to support equitable, patient-centred care.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.287 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.140 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.534 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.450 Zit.