Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of a novel large language model (LLM)-powered chatbot for oral boards scenarios
4
Zitationen
10
Autoren
2024
Jahr
Abstract
Abstract Purpose While previous studies have demonstrated that generative artificial intelligence (AI) can pass medical licensing exams, AI’s role as an examiner in complex, interactive assessments remains unknown. AI-powered chatbots could serve as educational tools to simulate oral examination dialogues. Here, we present initial validity evidence for an AI-powered chatbot designed for general surgery residents to prepare for the American Board of Surgery (ABS) Certifying Exam (CE). Methods We developed a chatbot using GPT-4 to simulate oral board scenarios. Scenarios were completed by general surgery residents from six different institutions. Two experienced surgeons evaluated the chatbot across five domains: inappropriate content, missing content, likelihood of harm, extent of harm, and hallucinations. We measured inter-rater reliability to determine evaluation consistency. Results Seventeen residents completed a total of 20 scenarios. Commonly tested topics included small bowel obstruction (30%), diverticulitis (20%), and breast disease (15%). Based on two independent reviewers, evaluation revealed 11–25% of chatbot simulations had no errors and an additional 11%–35% contained errors of minimal clinical significance. The chatbot limitations included incorrect management advice and critical omissions of information. Conclusions This study demonstrates the potential of an AI-powered chatbot in enhancing surgical education through oral board simulations. Despite challenges in accuracy and safety, the chatbot offers a novel approach to medical education, underscoring the need for further refinement and standardized evaluation frameworks. Incorporating domain-specific knowledge and expert insights is crucial for improving the efficacy of AI tools in medical education.
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.632 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.550 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.548 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.310 Zit.
Autoren
Institutionen
- Columbia University Irving Medical Center(US)
- Texas Health Dallas(US)
- Campbell Collaboration(NO)
- Brigham and Women's Hospital(US)
- Harvard University(US)
- The University of Texas Medical Branch at Galveston(US)
- SUNY Downstate Health Sciences University(US)
- University of California, San Francisco(US)
- University at Buffalo, State University of New York(US)
- University of Minnesota(US)
- Mayo Clinic in Arizona(US)