OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 12:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Examination of the Quality and Readability of Chatbot Responses to Patient Questions: A Synthesis of Recent Studies (Preprint)

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Patient use of chatbots to obtain medical information has been anticipated with both optimism and pessimism. The simplicity of asking questions and receiving immediate answers has prompted investigators to examine the quality and readability of chatbot responses. We sought to review the current results at this nascent stage in chatbot development. </sec> <sec> <title>OBJECTIVE</title> To evaluate current data. </sec> <sec> <title>METHODS</title> We searched multiple databases to identify studies that evaluated response quality using the DISCERN instrument; designed to assess written material intended for patients. From these studies, we extracted the DISCERN scores, the number of words used in questions, the number of questions asked, the number of evaluators, and, if recorded, the readability of the responses. We also examined a measure of the rank of the journals in which the studies were published. We combined these parameters in a multiple linear regression model to determine potential associations with response quality. </sec> <sec> <title>RESULTS</title> We identified 32 studies that conducted 57 tests using multiple chatbots. The average number of words in chatbot prompts ranged from 6 to 41, and the number of questions ranged from 3 to 119. As the response quality increased, readability decreased. Forty-two percent of tests produced average responses ranked as “good” or higher, and only one response test was below college-level readability. An increased DISCERN score was associated with increased prompt words and questions in simple linear regression. In a multiple linear regression model, higher DISCERN scores were associated with the number of questions, three or more evaluators, inversely with the journal rank, but not with the number of prompt words. </sec> <sec> <title>CONCLUSIONS</title> The variable quality and poor readability of chatbot responses to patient questions reinforces pessimism about their role. However, the principles of prompt engineering (the art of asking questions) have yet to be rigorously applied. Therefore, we remain optimistic that response quality and readability will improve. </sec>

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationHealth Literacy and Information AccessibilityAI in Service Interactions
Volltext beim Verlag öffnen