Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Multidimensional Evaluation of AI Chatbot Responses to a Standardized Patient Query on MOGAD: A Blinded Expert Analysis (Preprint)

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Large language model-based chatbots are increasingly used by the public to access medical information. While these tools offer considerable potential in terms of accessibility and scalability, their accuracy, transparency, and clarity remain insufficiently evaluated for rare and diagnostically complex conditions such as myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD). </sec> <sec> <title>OBJECTIVE</title> This study aimed to evaluate the quality, comprehensibility, transparency, and readability of responses generated by widely used AI chatbot platforms in response to a standardized, patient-centered question about MOGAD. </sec> <sec> <title>METHODS</title> We conducted a cross-sectional content analysis using the query: “What is MOGAD, and how is MOGAD treated?” Ten widely used chatbot platforms were selected to reflect diversity in architecture, access model, and functional design. Responses were collected on the same day, anonymized, and independently evaluated by seven blinded neurologists. Validated instruments were used, including DISCERN (treatment quality), PEMAT-P (understandability), Web Resource Rating (WRR; citation transparency), and two readability metrics: Flesch–Kincaid Grade Level (FKGL) and Coleman–Liau Index (CLI). Chatbots were also compared by access type (free vs paid) and functional focus (conversation-based vs search-based). Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). </sec> <sec> <title>RESULTS</title> Significant differences were observed across platforms in DISCERN, PEMAT-P, and WRR scores (all p < 0.001). Paid chatbots demonstrated higher treatment quality (p = 0.020) and citation transparency (p = 0.001) compared to free versions. Search-based models produced more understandable responses than conversation-based ones (p = 0.035). However, none of the chatbot responses achieved the recommended readability threshold for public-facing health communication (FKGL < 8). Inter-rater agreement was excellent across all expert-rated measures (ICC ≥ 0.838). </sec> <sec> <title>CONCLUSIONS</title> AI chatbot responses to patient queries about MOGAD vary widely in quality, clarity, and transparency. These findings highlight the need for structured benchmarking, transparent evaluation frameworks, and thoughtful oversight in the use of generative AI tools for digital health communication, particularly in the context of rare and clinically complex diseases. </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationMobile Health and mHealth ApplicationsMachine Learning in Healthcare

Volltext beim Verlag öffnen

Multidimensional Evaluation of AI Chatbot Responses to a Standardized Patient Query on MOGAD: A Blinded Expert Analysis (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen