OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 06:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Faculty versus artificial intelligence chatbot: a comparative analysis of multiple-choice question quality in physiology

2025·1 Zitationen·AJP Advances in Physiology EducationOpen Access
Volltext beim Verlag öffnen

1

Zitationen

9

Autoren

2025

Jahr

Abstract

Multiple-choice questions (MCQs) are widely used for assessment in medical education. While human-generated MCQs benefit from pedagogical insight, creating high-quality items is time intensive. With the advent of artificial intelligence (AI), tools like DeepSeek R1 offer potential for automated MCQ generation, though their educational validity remains uncertain. With this background, this study compared the psychometric quality of Physiology MCQs generated by faculty and an AI chatbot. A total of 200 MCQs were developed following the standard syllabus and question design guidelines: 100 by the Physiology faculty and 100 by the AI chatbot DeepSeek R1. Fifty questions from each group were randomly selected and administered to undergraduate medical students in 2 hours. Item analysis was conducted postassessment using difficulty index (DIFI), discrimination index (DI), and nonfunctional distractors (NFDs). Statistical comparisons were made using t tests or nonparametric equivalents, with significance at <i>P</i> < 0.05. Chatbot-generated MCQs had a significantly higher DIFI (0.64 ± 0.22) than faculty MCQs (0.47 ± 0.19; <i>P</i> < 0.0001). No significant difference in DI was found between the groups (<i>P</i> = 0.17). Faculty MCQs had significantly fewer NFDs (median 0) compared to chatbot MCQs (median 1; <i>P</i> = 0.0063). AI-generated MCQs demonstrated comparable discrimination ability but were generally easier and contained more ineffective distractors. While chatbots show promise in MCQ generation, further refinement is needed to improve distractor quality and item difficulty. AI can complement but not yet replace human expertise in assessment design.<b>NEW & NOTEWORTHY</b> This study contributes to the growing research on artificial intelligence (AI)- versus faculty-generated multiple-choice questions in Physiology. Psychometric analysis showed that AI-generated items were generally easier but had comparable discrimination ability to faculty-authored questions, while containing more nonfunctional distractors. By focusing on Physiology, this work offers discipline-specific insights and underscores both the potential and current limitations of AI in assessment development.

Ähnliche Arbeiten