Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
CLEAR: Pilot Testing of a Tool to Standardize Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
2
Zitationen
3
Autoren
2023
Jahr
Abstract
Artificial intelligence (AI)-based conversational models, such as ChatGPT, Microsoft Bing, and Google Bard, emerged as valuable sources of health information for the lay individuals. However, the accuracy of information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool referred to as “CLEAR”, designed to assess the quality of health information delivered by AI-based models. Tool development involved a literature review on health information quality, followed by initial establishment of the CLEAR tool comprising five items that aimed to assess the following: completeness of content in response to the prompt, lack of false information, evidence support, appropriateness, and relevance of the generated content. Each item was scored on a 5-point Likert scale from excellent to poor. Content validity was checked by expert review of the initial items. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked using the Cronbach α. Feedback through the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess health information quality generated through four different AI-based models in five different, yet common health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Bing, and Bard, and the content generated was scored by two independent raters with Cohen κ to assess the inter-rater agreement. The final five CLEAR items were: (1) Is the content sufficient? (2) Is the content accurate? (3) Is the content evidence-based? (4) Is the content clear, concise, and easy to understand? and (5) Is the content free from irrelevant information? Pilot testing using the eight different health topics revealed an acceptable internal consistency with a Cronbach α range of 0.669–0.981. The use of the final CLEAR tool yielded the following average scores: Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Bing (κ=0.348, P=.037), and Bard (κ=.749, P<.001). The CLEAR tool is a brief yet helpful tool that can aid to standardize testing of the quality of health information generated by the AI-based conversational models. Future studies are recommended to validate the utility of the CLEAR tool to assess the quality of the AI-generated health-related content using a larger sample across various complex health topics.
Ähnliche Arbeiten
Improving the Quality of Web Surveys: The Checklist for Reporting Results of Internet E-Surveys (CHERRIES)
2004 · 6.114 Zit.
The content validity index: Are you sure you know what's being reported? critique and recommendations
2006 · 6.069 Zit.
Health literacy and public health: A systematic review and integration of definitions and models
2012 · 5.815 Zit.
Low Health Literacy and Health Outcomes: An Updated Systematic Review
2011 · 5.205 Zit.
Health literacy as a public health goal: a challenge for contemporary health education and communication strategies into the 21st century
2000 · 4.930 Zit.