Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can ChatGPT, Bing, and Bard save lives? Evaluation of correctness and reliability of chatbots in teaching bystanders to help victims
0
Zitationen
5
Autoren
2024
Jahr
Abstract
<title>Abstract</title> Background: Timely recognition and initiation of basic life support (BLS) before emergency medical services (EMS) arrives significantly improves survival rates and neurological outcomes. In an era where health information-seeking behaviors have shifted toward online sources, chatbots powered by generative artificial intelligence (AI) are emerging as potential tools for providing immediate health-related guidance. This study investigates the reliability of AI chatbots, specifically GPT-3.5, GPT-4, Bard, and Bing, in responding to BLS scenarios. Methods: A cross-sectional study was conducted using six scenarios adapted from the BLS Objective Structured Clinical Examination (OSCE) by United Medical Education. These scenarios encompassed adult, pediatric, and infant emergencies and were presented to each chatbot on two occasions, one week apart. Responses were evaluated by a board-certified emergency medicine professor from Tehran University of Medical Sciences, using a checklist based on BLS-OSCE standards. Correctness was assessed, and reliability was measured using Cohen's kappa coefficient. Results: GPT4 demonstrated the highest correctness in adult scenarios (85% correct responses), while Bard showed 60% correctness. GPT3.5 and Bing performed poorly across all scenarios. Bard had a correctness rate of 52.17% in pediatric scenarios, but all chatbots scored below 44% in infant scenarios. Cohen's kappa indicated substantial reliability for GPT-4 (k=0.649) and GPT3.5 (k=0.645), moderate reliability for Bing (k=0.503), and fair reliability for Bard (k=0.357). Conclusion: GPT4 showed acceptable performance and substantial reliability in adult BLS scenarios. However, the overall limited correctness and reliability of all chatbots across different scenarios indicate that current AI chatbots are unsuitable for providing life-saving instructions in critical medical emergencies.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.