Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Alexa, write my exam: ChatGPT for MCQ creation
5
Zitationen
4
Autoren
2024
Jahr
Abstract
Writing high-quality exam questions requires substantial faculty development and, more importantly, diverts time from other significant educational responsibilities. Recent research has demonstrated the efficiency of ChatGPT in generating multiple-choice questions (MCQs) and its ability to pass all three United States Medical Licensing Exams.1 Given the potential of new artificial intelligence systems like ChatGPT, this study aims to explore their use in streamlining item writing without compromising the desirable psychometric properties of assessments. ChatGPT 3.5 was prompted to ‘write 25 MCQs with clinical vignette in UMSLE Step 1 style on the pharmacology of antibiotics, antivirals and antiparasitic drugs addressing their indications, mechanism of action, adverse effects and contraindications’. Faculty reviewed all questions for accuracy and made minor modifications. For questions that did not align with the courses' learning objectives, ChatGPT was prompted to generate alternatives, such as ‘another question on the Pharmacology of HIV drugs’. Additionally, 25 MCQs were created without the help of ChatGPT. ChatGPT assisted question writing took approximately 1 hour (with adjustments and corrections) compared to 9 hours without the help of ChatGPT. Seventy-one second year Pharmacy students were assessed in Spring 2023 with a 50-item exam consisting of 25 ChatGPT-constructed and 25 faculty-generated MCQs. We compared the difficulty and psychometric characteristics of the ChatGPT-assisted and non-assisted questions using descriptive statistics, student's t-tests and Mann–Whitney test. Students' performance on MCQs generated by ChatGPT was not significantly different to that on faculty-generated items for the average scores (76.44%, SD = 16.71 for ChatGPT vs. 82.52 %, SD = 10.90 for faculty), discrimination index (0.29, SD = 0.15 for ChatGPT vs. 0.25, SD = 0.17 for faculty), and the point-biserial correlation (0.31, SD = 0.13 for ChatGPT vs. 0.28, SD = 0.15 for faculty). Students took longer on average to answer ChatGPT-generated questions compared to faculty-generated questions (71 seconds, SD = 22 for ChatGPT vs. 58 seconds, SD = 25 for faculty, p < 0.05), likely due to the prevalence of ‘window dressing’. This question flaw was identified in 40% of the ChatGPT-generated questions, which may explain the additional time required. We learned that while ChatGPT can effectively generate high-quality MCQs, saving time in the process, careful review by content experts is necessary to ensure the quality of the questions, particularly to identify and correct ‘window dressing’ flaws commonly found in ChatGPT-generated items. We will present this data at upcoming faculty development sessions to promote the adoption of ChatGPT for generating exam questions. By presenting robust data demonstrating ChatGPT's efficacy, we believe that more faculty will integrate this tool into their question writing processes. Faculty will also be alerted to potential questions flaws and prepared to address them. Additionally, recognising that students often desire more practice questions, we discovered that they are generally unfamiliar with this method. We plan to empower students to use ChatGPT to assist with their studies, while concurrently training faculty to become more adept at using ChatGPT to generate both practice and test items. The authors declare that they have no conflict of interest. The data that support the findings of this study are available from the corresponding author upon reasonable request.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.