Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
586. Let's Have a Chat: How Well Does an Artificial Intelligence Chatbot Answer Clinical Infectious Diseases Questions?
0
Zitationen
10
Autoren
2025
Jahr
Abstract
Abstract Background ChatGPT is an artificial intelligence tool used by practitioners to answer clinical questions. It’s unknown whether ChatGPT provides quality responses to infectious diseases (ID)-specific questions. This study surveyed ID pharmacist subject matter experts (SMEs) to assess the quality of ChatGPT responses. Methods The primary outcome was percentage of ChatGPT responses considered useful. Secondary outcomes were SME’s ratings of correctness, completeness, and safety (C/C/S). One hundred clinically encountered questions by ID pharmacists were assembled and internally validated. Questions were entered into ChatGPT version 3.5 and responses were recorded. Definitions for useful and C/C/S were based on prior definitions and literature. A 0-10 rating scale for C/C/S was developed and validated for interrater reliability using a random sample. Questions with ChatGPT responses were sent to five SMEs for evaluation. Ordinal and categorical variables were assessed for interrater reliability using an average measures intraclass correlation coefficient (ICC) and Fleiss Multirater Kappa (FMK), respectively. SMEs’ responses were compared using the Kruskal-Wallis and Chi-square tests for ordinal and categorical variables, respectively. A post-hoc analysis was performed to identify the location of differences between SME ratings based on question difficulty and category for C/C/S using a Mann-Whitney U test with Bonferroni correction. Results SMEs considered 41.8% of responses useful. Median (IQR) ratings for C/C/S were 7 (IQR 4-9), 5 (IQR 3-8), and 8 (IQR 4-10), respectively. The FMK for useful was 0.379 (95% CI 0.317-0.441) indicating fair agreement, and ICC were 0.820 (95% CI 0.758-0.870), 0.745 (95% CI 0.656-0.816), and 0.833 (95% CI 0.775-0.880) for C/C/S, respectively, indicating substantial agreement. No significant difference was observed between SME responses for the percentage of responses considered useful. Neither question category nor difficulty resulted in a difference in SMEs’ ratings for C/C/S or percentage of responses considered useful. Conclusion Fewer than half of ChatGPT responses were considered useful by SMEs. However, responses were mostly considered correct and safe, but often deemed incomplete. Disclosures Conan MacDougall, PharmD, MAS, Merck: Grant/Research Support Elias Chahine, Pharm.D., Seqirus: Advisor/Consultant|Seqirus: Honoraria Wesley D. Kufel, Pharm.D., BCPS, BCIDP, Merck & Co.: Grant/Research Support|Shionogi, Inc: Grant/Research Support
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.632 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.552 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.548 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.313 Zit.