Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The accuracy of AI-assisted chatbots on the annual assessment test for emergency medicine residents
13
Zitationen
5
Autoren
2024
Jahr
Abstract
The incorporation of natural language processing (NLP) models into medical education has accelerated with the introduction of ChatGPT. This study aimed to evaluate and compare the performance of ChatGPT-3.5, Bard and the residents in the annual assessment test for emergency medicine (EM) residents. A total of 90 questions covering 18 distinct topics within the field of EM were asked to residents. The same questions were directed to ChatGPT-3.5 and Bard without excluding those with images. The percentage of correct answers were calculated and represented through a histogram showcasing the distribution of test scores across various bins. Questions were categorized as medical knowledge and clinical reasoning to further assess chatbots' performance based on question types. ChatGPT-3.5 demonstrated a 60% accuracy, securing the 10th position, while Bard achieved an accuracy of 55.5%, placing 21st in the rankings among 46 residents. ChatGPT-3.5 performed better in the 16 subtopics. Bard outperformed ChatGPT-3.5 only on cardiovascular and pulmonary emergencies, accounting 24.4% of questions. Analysis of question types revealed ChatGPT-3.5's higher accuracy in medical knowledge (66%) compared to Bard (56%), while Bard performed better in clinical reasoning (55%) than ChatGPT-3.5 (52.5%). Despite lacking access to ECG images, both models answered several questions by providing text-based ECG interpretations. ChatGPT-3.5 and Bard demonstrated impressive performance in the task of medical question answering. On the other hand, issues related to ECG interpretations raises questions about the reliability of these models. Our findings highlight the importance of verifying the outputs generated by these models.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.