Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
EVALUATING THE EFFECTIVENESS OF CHATGPT IN MARKING STUDENT WRITING
0
Zitationen
1
Autoren
2026
Jahr
Abstract
This study investigated the reliability of the AI tool ChatGPT in evaluating Level 2 students’ writing papers in the Foundation Program at the University of Technology and Applied Sciences (UTAS)–AlMussanah, in comparison with human markers. The research aimed to examine the extent to which ChatGPT can accurately assess student writing using a predefined rubric and to identify differences between AI-generated and human-generated scores at both the overall and criterion levels. Two writing tasks were analyzed: Task 1 (Guided Writing) and Task 2 (Free Writing), with assessment criteria including task achievement, organization, vocabulary, and grammar. The findings revealed moderate to strong agreement between ChatGPT and human markers in overall scores, particularly for the guided writing task. However, agreement was lower at the criterion level, and ChatGPT demonstrated weaker consistency in assessing free writing tasks. Additionally, ChatGPT tended to assign lower scores in grammar, vocabulary, and organization, while awarding slightly higher scores in task achievement. Statistically, ChatGPT underscored student writing by an average of 1.53 points compared to human markers. While the results suggest that ChatGPT can support writing assessment with reasonable alignment to human judgment, the findings highlight the necessity of human oversight, especially open-ended tasks and nuanced evaluation criteria. The study concludes that ChatGPT should be used as a supportive assessment tool rather than a replacement for human marking to ensure fairness and accuracy in student grading.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.