Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of a novel large language model (LLM)-powered chatbot for oral boards scenarios

2024·4 Zitationen·Global Surgical Education - Journal of the Association for Surgical EducationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Abstract Purpose While previous studies have demonstrated that generative artificial intelligence (AI) can pass medical licensing exams, AI’s role as an examiner in complex, interactive assessments remains unknown. AI-powered chatbots could serve as educational tools to simulate oral examination dialogues. Here, we present initial validity evidence for an AI-powered chatbot designed for general surgery residents to prepare for the American Board of Surgery (ABS) Certifying Exam (CE). Methods We developed a chatbot using GPT-4 to simulate oral board scenarios. Scenarios were completed by general surgery residents from six different institutions. Two experienced surgeons evaluated the chatbot across five domains: inappropriate content, missing content, likelihood of harm, extent of harm, and hallucinations. We measured inter-rater reliability to determine evaluation consistency. Results Seventeen residents completed a total of 20 scenarios. Commonly tested topics included small bowel obstruction (30%), diverticulitis (20%), and breast disease (15%). Based on two independent reviewers, evaluation revealed 11–25% of chatbot simulations had no errors and an additional 11%–35% contained errors of minimal clinical significance. The chatbot limitations included incorrect management advice and critical omissions of information. Conclusions This study demonstrates the potential of an AI-powered chatbot in enhancing surgical education through oral board simulations. Despite challenges in accuracy and safety, the chatbot offers a novel approach to medical education, underscoring the need for further refinement and standardized evaluation frameworks. Incorporating domain-specific knowledge and expert insights is crucial for improving the efficacy of AI tools in medical education.

Autoren

Institutionen

Themen

AI in Service InteractionsArtificial Intelligence in Healthcare and EducationSocial Robot Interaction and HRI

Volltext beim Verlag öffnen

Evaluation of a novel large language model (LLM)-powered chatbot for oral boards scenarios

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen