OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.05.2026, 09:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Generative artificial intelligence-driven chatbots and medical misinformation: an accuracy, referencing and readability audit

2026·1 Zitationen·BMJ OpenOpen Access
Volltext beim Verlag öffnen

1

Zitationen

7

Autoren

2026

Jahr

Abstract

OBJECTIVES: Artificial intelligence (AI)-driven chatbots have been rapidly adopted across research, education, business, marketing and medicine. Most interactions, however, come from non-experts using chatbots like search engines, including for everyday health and medical queries. DESIGN: We conducted an original study to audit chatbot responses in health and medical fields prone to misinformation. METHODS: highly problematic' using a coding matrix based on objective, predefined criteria. Citations were scored for accuracy and completeness, and each response was given a Flesch Reading Ease score. RESULTS: Nearly half (49.6%) of responses were problematic: 30% somewhat problematic and 19.6% highly problematic. Response quality did not differ significantly among chatbots (p=0.566) but Grok generated significantly more highly problematic responses than would be expected under a random distribution (z-score +2.07, p=0.038). Performance was strongest in vaccines (mean z-score -2.57) and cancer (-2.12), and weakest in stem cells (+1.25), athletic performance (+3.74) and nutrition (+4.35). Chatbot outputs were consistently expressed with confidence and certainty; from 250 total questions, there were only two refusals to answer (0.8%), both from Meta AI. Reference quality was poor, with a median completeness score of 40% (Q1-Q3: 20-67%). Chatbot hallucinations and fabricated citations precluded any chatbot from producing a fully accurate reference list. All readability scores were graded as 'Difficult' (30-50), equivalent to college sophomore-senior level. CONCLUSIONS: The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields. Continued deployment without public education and oversight risks amplifying misinformation.

Ähnliche Arbeiten