Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Analyzing the Accuracy of Large Language Models in United States Medical Licensing Exam Social-Science Preparation Question Banks
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Introduction: Artificial intelligence (AI) emergence has changed the medical education landscape. The United States Medical Licensing Exams increasingly include questions in the “Social Sciences (Ethics/Legal/Professional)” domain, which often require reasoning through complex scenarios. This study evaluates three AI platforms in answering such preparation questions. Methods: Social-science questions from UWorld and Amboss were accumulated for Steps 1 and 2. Multiple-choice questions were entered into ChatGPT, Gemini, and Perplexity, yielding a correct/incorrect response. Percentages of correct responses were compared between platforms and to student averages. Analysis of variance and t-tests conducted determined statistical significance, the upper threshold being P = 0.05. As this study focused on quantitative performance, outcomes were limited to accuracy on multiple-choice items, rather than direct measures of empathetic reasoning. Results: One hundred nine UWorld and 63 Amboss questions were available for Step 1. A total of 189 UWorld and 185 Amboss questions were available for Step 2. Google Gemini had the highest accuracy for Step 1 (86.6%), while ChatGPT had the highest accuracy for Step 2 (83.6%). All platforms outperformed the student average for Step 1 (68.5%) and Step 2 (68.9%), with ChatGPT and Gemini doing so significantly for Step 1 (p < 0.01), and ChatGPT doing so significantly for Step 2 (p < 0.01). Conclusion: It is critical to understand whether empathetic thinking can be replicated by technology or prepare students. This study highlights the ability for AI to solve ethical dilemmas that students may struggle with.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.508 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.393 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.864 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.564 Zit.