Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
GPT-4 versus human authors in clinically complex MCQ creation: a blinded analysis of item quality
4
Zitationen
6
Autoren
2024
Jahr
Abstract
<title>Abstract</title> MCQs are a popular assessment format in medical education. Creating clinically complex MCQs can be a time-consuming task for subject matter experts. Large language models such as GPT-4, a type of generative artificial intelligence (AI), are a potential tool for MCQ design. Clinically complex human-generated MCQs, at both novice and expert level, were compared with AI MCQs. A generic prompt for GPT-4 was engineered, which included item-writing guidance, example MCQs, and key learning points. A standardised scoring system was developed for a consensus panel to objectively evaluate each item, blinded to the author, on categories including content validity, scope, item anatomy, cognitive skill level, item-writing flaws (IWFs), feedback comprehensiveness, veracity, adequacy of clinical reasoning, and global impression of fitness for use. Analysis showed that all groups (novice, expert, and AI) were able generate items within scope. Expert items performed better than Novice items in all categories. Expert items performed better than AI in content validity, feedback veracity and clinical reasoning. They also tended to test higher order cognitive skills. There was no difference in the global impressions of Expert and AI items, which suggests they may be comparable overall. With adequate prompt engineering, GPT-4 can produce MCQs testing clinically complex concepts for medical assessment. The quality of AI outputs is comparable to experts, however human validation is necessary to ensure content validity. The AI-generated explanatory feedback was adequate in veracity and clinical reasoning, which may serve as an educational tool for learners.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.