Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence Versus Human Performance in Psychiatry Education: Comparing AI Models With First-Year Residents on Theoretical Examinations and Clinical Skills Assessment: Cross-Sectional Study (Preprint)
0
Zitationen
9
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Artificial intelligence (AI) is revolutionizing healthcare education, offering transformative solutions in knowledge assessment and training methodologies. Large language models (LLMs) such as ChatGPT, Gemini Advanced, and Claude Sonnet have demonstrated remarkable capabilities across various domains, with recent studies showing these models achieving passing scores on standardized medical examinations. However, fundamental questions persist about AI's role and limitations in psychiatry, where contextual understanding and human interaction are particularly crucial. </sec> <sec> <title>OBJECTIVE</title> This study aimed to compare the performance of AI models (ChatGPT-3.5, Gemini Advanced, Claude Sonnet) with first-year psychiatry residents on theoretical exams and OSCEs covering Basic Neurosciences and Psychology, Sociology & Anthropology at a major Indian psychiatric institute. </sec> <sec> <title>METHODS</title> This cross-sectional study compared the performance of three large language models with first-year psychiatry residents (N=25) at the National Institute of Mental Health and Neurosciences (NIMHANS), an Institute of National Importance in India. Standardized theoretical exams and OSCEs were used, with AI and resident responses blindly scored by faculty against established rubrics. Four faculty members with ≥8 years of experience independently evaluated each AI model's theoretical examination responses using the same standardized rubrics applied to resident assessment. </sec> <sec> <title>RESULTS</title> AI models consistently surpassed residents in theory exams with Gemini Advanced achieving +5.14 standard deviations above resident mean in Neurosciences (71.25 ± 3.86 vs 58.0 ± 2.58) and Claude Sonnet achieving +8.77 standard deviations in Psychology (72.88 ± 3.77 vs 50.96 ± 2.49). In OSCEs, performance was comparable for Neurosciences (AI models: 13.0 vs residents: 13.16 ± 1.49), but varied for Psychology, where Gemini Advanced (18.0 ± 0.00) and Claude Sonnet (20.0 ± 1.41) exceeded the resident mean score (16.6 ± 1.55). Specific errors in AI responses included incorrect recall of standardized test details and misattribution of neuropsychological test functions. </sec> <sec> <title>CONCLUSIONS</title> AI models demonstrated superior theoretical knowledge but variable clinical reasoning performance in psychiatric education assessments. While AI achieved exceptional scores in theoretical examinations, OSCE performance was inconsistent with notable factual errors absent in human responses. These findings indicate AI's potential as a supplementary tool for theoretical knowledge delivery and assessment, while confirming the necessity of human expertise for clinical skills evaluation. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.349 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.219 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.631 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.480 Zit.