Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating AI Chatbot Information on Trending Topics in Anesthesiology
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Background: Artificial intelligence (AI) is increasingly being utilized as an informational resource, with chatbots attracting users for their ability to generate instantaneous responses. This study aims to evaluate the responses from four AI chatbots - Gemini, ChatGPT, Copilot, and Perplexity - focusing on general, local, and regional anesthesia. The assessment focuses on understandability, actionability, readability, response quality, and potential misinformation. These aspects were measured using DISCERN, PEMAT5, and Flesch-Kincaid reading scores. Methods: The input prompts for the four chatbots were created from the top Google Trends search terms for general anesthesia, local anesthesia, and regional anesthesia from March 8th, 2020 to March 8th, 2025. The AI chatbot outputs were assessed using the following validated tools: Patient Education Material Assessment Tool (PEMAT) for understandability and actionability, DISCERN for quality of information, and the Flesch-Kincaid formula for readability. Potential misinformation was evaluated using the American Society of Anesthesiologists (ASA) guidelines. Three blinded reviewers (A.K., J.S., R.U.) independently adjudicated chatbot responses. Statistical analysis included the chi-square test for PEMAT understandability and actionability scores and the Kruskal-Wallis test for DISCERN and Flesch-Kincaid scores. Statistical tests were also conducted using the Mann-Whitney U test with post-hoc pairwise comparisons with Bonferroni adjustment. Results: Perplexity (p < 0.001), ChatGPT (p = 0.001), and Gemini (p = 0.001) showed significantly higher rates for understandability than Copilot, though no significant differences were found among Perplexity, ChatGPT, and Gemini. No significant differences were seen for actionability. Perplexity had a significantly higher DISCERN score than ChatGPT (p < 0.001), Gemini (p < 0.001), and Copilot (p < 0.001). There were statistically significant differences in readability between Perplexity and Gemini (p < 0.001), as well as between ChatGPT and Gemini (p = 0.005). Conclusions: This study is one of the first to evaluate how chatbots can process queries on anesthesiology. As AI continues to evolve, it will soon become a primary source of scientific information for patient understanding. The need to review the dissemination of this information is crucial as it allows us to gauge how and if AI chatbots can be beneficial for patient use and recommendation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.508 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.393 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.864 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.564 Zit.