Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
In-depth analysis of ChatGPT’s performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions
23
Zitationen
12
Autoren
2024
Jahr
Abstract
ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r<sub>s</sub> = -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.197 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.047 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.410 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.
Autoren
Institutionen
- Freie Universität Berlin(DE)
- Charité - Universitätsmedizin Berlin(DE)
- Humboldt-Universität zu Berlin(DE)
- Klinikum rechts der Isar(DE)
- Harvard University(US)
- Technical University of Munich(DE)
- Brigham and Women's Hospital(US)
- University Hospital Regensburg(DE)
- Erasmus University Rotterdam(NL)
- Queen Mary University of London(GB)
- Guangdong Provincial People's Hospital(CN)
- Emory University(US)
- Munich Business School(DE)
- Ludwig-Maximilians-Universität München(DE)
- Instituto Ivo Pitanguy(BR)