Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating AI Chatbot Information on Trending Topics in Anesthesiology

2025·0 Zitationen·International Journal of Anesthetics and AnesthesiologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background: Artificial intelligence (AI) is increasingly being utilized as an informational resource, with chatbots attracting users for their ability to generate instantaneous responses. This study aims to evaluate the responses from four AI chatbots - Gemini, ChatGPT, Copilot, and Perplexity - focusing on general, local, and regional anesthesia. The assessment focuses on understandability, actionability, readability, response quality, and potential misinformation. These aspects were measured using DISCERN, PEMAT5, and Flesch-Kincaid reading scores. Methods: The input prompts for the four chatbots were created from the top Google Trends search terms for general anesthesia, local anesthesia, and regional anesthesia from March 8th, 2020 to March 8th, 2025. The AI chatbot outputs were assessed using the following validated tools: Patient Education Material Assessment Tool (PEMAT) for understandability and actionability, DISCERN for quality of information, and the Flesch-Kincaid formula for readability. Potential misinformation was evaluated using the American Society of Anesthesiologists (ASA) guidelines. Three blinded reviewers (A.K., J.S., R.U.) independently adjudicated chatbot responses. Statistical analysis included the chi-square test for PEMAT understandability and actionability scores and the Kruskal-Wallis test for DISCERN and Flesch-Kincaid scores. Statistical tests were also conducted using the Mann-Whitney U test with post-hoc pairwise comparisons with Bonferroni adjustment. Results: Perplexity (p < 0.001), ChatGPT (p = 0.001), and Gemini (p = 0.001) showed significantly higher rates for understandability than Copilot, though no significant differences were found among Perplexity, ChatGPT, and Gemini. No significant differences were seen for actionability. Perplexity had a significantly higher DISCERN score than ChatGPT (p < 0.001), Gemini (p < 0.001), and Copilot (p < 0.001). There were statistically significant differences in readability between Perplexity and Gemini (p < 0.001), as well as between ChatGPT and Gemini (p = 0.005). Conclusions: This study is one of the first to evaluate how chatbots can process queries on anesthesiology. As AI continues to evolve, it will soon become a primary source of scientific information for patient understanding. The need to review the dissemination of this information is crucial as it allows us to gauge how and if AI chatbots can be beneficial for patient use and recommendation.

Autoren

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Evaluating AI Chatbot Information on Trending Topics in Anesthesiology

Abstract

Ähnliche Arbeiten

Autoren

Themen