Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Faculty versus artificial intelligence chatbot: a comparative analysis of multiple-choice question quality in physiology
1
Zitationen
9
Autoren
2025
Jahr
Abstract
Multiple-choice questions (MCQs) are widely used for assessment in medical education. While human-generated MCQs benefit from pedagogical insight, creating high-quality items is time intensive. With the advent of artificial intelligence (AI), tools like DeepSeek R1 offer potential for automated MCQ generation, though their educational validity remains uncertain. With this background, this study compared the psychometric quality of Physiology MCQs generated by faculty and an AI chatbot. A total of 200 MCQs were developed following the standard syllabus and question design guidelines: 100 by the Physiology faculty and 100 by the AI chatbot DeepSeek R1. Fifty questions from each group were randomly selected and administered to undergraduate medical students in 2 hours. Item analysis was conducted postassessment using difficulty index (DIFI), discrimination index (DI), and nonfunctional distractors (NFDs). Statistical comparisons were made using t tests or nonparametric equivalents, with significance at <i>P</i> < 0.05. Chatbot-generated MCQs had a significantly higher DIFI (0.64 ± 0.22) than faculty MCQs (0.47 ± 0.19; <i>P</i> < 0.0001). No significant difference in DI was found between the groups (<i>P</i> = 0.17). Faculty MCQs had significantly fewer NFDs (median 0) compared to chatbot MCQs (median 1; <i>P</i> = 0.0063). AI-generated MCQs demonstrated comparable discrimination ability but were generally easier and contained more ineffective distractors. While chatbots show promise in MCQ generation, further refinement is needed to improve distractor quality and item difficulty. AI can complement but not yet replace human expertise in assessment design.<b>NEW & NOTEWORTHY</b> This study contributes to the growing research on artificial intelligence (AI)- versus faculty-generated multiple-choice questions in Physiology. Psychometric analysis showed that AI-generated items were generally easier but had comparable discrimination ability to faculty-authored questions, while containing more nonfunctional distractors. By focusing on Physiology, this work offers discipline-specific insights and underscores both the potential and current limitations of AI in assessment development.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.