Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

CLEAR: Pilot Testing of a Tool to Standardize Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models

2023·2 Zitationen·Preprints.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

Artificial intelligence (AI)-based conversational models, such as ChatGPT, Microsoft Bing, and Google Bard, emerged as valuable sources of health information for the lay individuals. However, the accuracy of information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool referred to as &ldquo;CLEAR&rdquo;, designed to assess the quality of health information delivered by AI-based models. Tool development involved a literature review on health information quality, followed by initial establishment of the CLEAR tool comprising five items that aimed to assess the following: completeness of content in response to the prompt, lack of false information, evidence support, appropriateness, and relevance of the generated content. Each item was scored on a 5-point Likert scale from excellent to poor. Content validity was checked by expert review of the initial items. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked using the Cronbach &alpha;. Feedback through the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess health information quality generated through four different AI-based models in five different, yet common health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Bing, and Bard, and the content generated was scored by two independent raters with Cohen &kappa; to assess the inter-rater agreement. The final five CLEAR items were: (1) Is the content sufficient? (2) Is the content accurate? (3) Is the content evidence-based? (4) Is the content clear, concise, and easy to understand? and (5) Is the content free from irrelevant information? Pilot testing using the eight different health topics revealed an acceptable internal consistency with a Cronbach &alpha; range of 0.669&ndash;0.981. The use of the final CLEAR tool yielded the following average scores: Bing (mean=24.4&plusmn;0.42), ChatGPT-4 (mean=23.6&plusmn;0.96), Bard (mean=21.2&plusmn;1.79), and ChatGPT-3.5 (mean=20.6&plusmn;5.20). The inter-rater agreement revealed the following Cohen &kappa; values: for ChatGPT-3.5 (&kappa;=0.875, P&lt;.001), ChatGPT-4 (&kappa;=0.780, P&lt;.001), Bing (&kappa;=0.348, P=.037), and Bard (&kappa;=.749, P&lt;.001). The CLEAR tool is a brief yet helpful tool that can aid to standardize testing of the quality of health information generated by the AI-based conversational models. Future studies are recommended to validate the utility of the CLEAR tool to assess the quality of the AI-generated health-related content using a larger sample across various complex health topics.

Autoren

Institutionen

Themen

Health Literacy and Information AccessibilityArtificial Intelligence in Healthcare and EducationPatient-Provider Communication in Healthcare

Volltext beim Verlag öffnen

CLEAR: Pilot Testing of a Tool to Standardize Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen