OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 01:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the accuracy and reasoning of using ChatGPT to evaluate the quality of health news

2024·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2024

Jahr

Abstract

<title>Abstract</title> <bold>Background</bold> With the growing prevalence of health misinformation online, there is an urgent need for tools that can reliably assist the public in evaluating the quality of health information. This study investigates the performance of ChatGPT, a representative large language model (LLM), in rating the quality of health news and providing explanatory reasoning. <bold>Methods</bold> We evaluated ChatGPT's performance using an expert-annotated dataset from HealthNewsReview.org, which assesses the quality of health news across nine criteria. ChatGPT was prompted with standardized queries tailored to each criterion. We measured its rating performance using precision, recall, and F1 scores for binary classification (satisfactory/not satisfactory). Additionally, linguistic complexity, readability, and the quality of ChatGPT’s explanatory reasoning were assessed through both quantitative linguistic analysis and manual evaluation of consistency and contextual relevance. <bold>Results</bold> ChatGPT’s rating performance varied across criteria, with the highest accuracy for the Cost criterion (F1= 0.824) and lower accuracy for Benefit, Conflict, and Quality criteria (F1 &lt; 0.5), underperforming compared to machine learning-based models. Its explanations were clear, with readability suited to late high school or early college levels, and scored highly for consistency (average score: 2.90/3) and contextual relevance (average score: 2.73/3), indicating strong explanatory potential despite rating limitations. <bold>Conclusion</bold> While ChatGPT’s rating accuracy requires improvement, its strength in offering comprehensible explanations presents a valuable opportunity to enhance public understanding of health news quality. Future research should aim to refine LLMs' rating accuracy while leveraging their explanatory strengths to better serve the needs of non-expert audiences.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AIMisinformation and Its Impacts
Volltext beim Verlag öffnen