Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating ChatGPT for Grading Programming Assignments: Effectiveness, Fairness, and Student Perceptions
0
Zitationen
5
Autoren
2026
Jahr
Abstract
This study investigates ChatGPT as an automated grading tool for programming assignments in higher education. Three datasets comprising Python, C++, and Java assignments were graded three times by ChatGPT and compared with faculty evaluations. Results show that ChatGPT achieves high grading accuracy, closely aligning with faculty scores and demonstrating statistically significant correlations. Statistical analyses using the Kolmogorov–Smirnov test, paired t-test, and Wilcoxon signed-rank test confirm overall agreement, although ChatGPT tends to apply stricter grading criteria. High intraclass correlation coefficients further indicate strong reliability and consistency across repeated grading attempts. The study highlights the critical role of well-defined rubrics in improving grading alignment and proposes an Instructor–AI Collaborative Rubric Development framework to support effective AI integration in assessment. A survey of 158 students indicates increased satisfaction and trust following disclosure of AI-assisted grading, although some still prefer human evaluation. Overall, the findings provide strong evidence that ChatGPT is a reliable and consistent grading tool, demonstrating close alignment with faculty evaluations and high reproducibility across attempts. However, its effectiveness is critically dependent on well-defined rubrics and requires human oversight to mitigate strictness, ensure fairness, and account for contextual nuances. These results strongly support a hybrid AI–human grading approach, grounded in transparent rubric design and reinforced by appropriate ethical safeguards.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.873 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.