Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Artificial Intelligence Versus Human Performance in Psychiatry Education: Comparing AI Models With First-Year Residents on Theoretical Examinations and Clinical Skills Assessment: Cross-Sectional Study (Preprint)

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Artificial intelligence (AI) is revolutionizing healthcare education, offering transformative solutions in knowledge assessment and training methodologies. Large language models (LLMs) such as ChatGPT, Gemini Advanced, and Claude Sonnet have demonstrated remarkable capabilities across various domains, with recent studies showing these models achieving passing scores on standardized medical examinations. However, fundamental questions persist about AI's role and limitations in psychiatry, where contextual understanding and human interaction are particularly crucial. </sec> <sec> <title>OBJECTIVE</title> This study aimed to compare the performance of AI models (ChatGPT-3.5, Gemini Advanced, Claude Sonnet) with first-year psychiatry residents on theoretical exams and OSCEs covering Basic Neurosciences and Psychology, Sociology & Anthropology at a major Indian psychiatric institute. </sec> <sec> <title>METHODS</title> This cross-sectional study compared the performance of three large language models with first-year psychiatry residents (N=25) at the National Institute of Mental Health and Neurosciences (NIMHANS), an Institute of National Importance in India. Standardized theoretical exams and OSCEs were used, with AI and resident responses blindly scored by faculty against established rubrics. Four faculty members with ≥8 years of experience independently evaluated each AI model's theoretical examination responses using the same standardized rubrics applied to resident assessment. </sec> <sec> <title>RESULTS</title> AI models consistently surpassed residents in theory exams with Gemini Advanced achieving +5.14 standard deviations above resident mean in Neurosciences (71.25 ± 3.86 vs 58.0 ± 2.58) and Claude Sonnet achieving +8.77 standard deviations in Psychology (72.88 ± 3.77 vs 50.96 ± 2.49). In OSCEs, performance was comparable for Neurosciences (AI models: 13.0 vs residents: 13.16 ± 1.49), but varied for Psychology, where Gemini Advanced (18.0 ± 0.00) and Claude Sonnet (20.0 ± 1.41) exceeded the resident mean score (16.6 ± 1.55). Specific errors in AI responses included incorrect recall of standardized test details and misattribution of neuropsychological test functions. </sec> <sec> <title>CONCLUSIONS</title> AI models demonstrated superior theoretical knowledge but variable clinical reasoning performance in psychiatric education assessments. While AI achieved exceptional scores in theoretical examinations, OSCE performance was inconsistent with notable factual errors absent in human responses. These findings indicate AI's potential as a supplementary tool for theoretical knowledge delivery and assessment, while confirming the necessity of human expertise for clinical skills evaluation. </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsInnovations in Medical Education

Volltext beim Verlag öffnen

Artificial Intelligence Versus Human Performance in Psychiatry Education: Comparing AI Models With First-Year Residents on Theoretical Examinations and Clinical Skills Assessment: Cross-Sectional Study (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen