Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT for Writing Evaluation: Examining the Accuracy and Reliability of AI-Generated Scores Compared to Human Raters

2024·12 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

ChatGPT has proven beneficial in a variety of educational contexts, yet its effectiveness in scoring integrated second-language writing tasks remains uncertain. This study, therefore, explores the accuracy and reliability of ChatGPT-generated scores versus human ratings under two prompting conditions (with or without writing prompt and source texts) and examines the reasons behind rating discrepancies using a mixed methods approach. ChatGPT rated 74 argumentative essays from the Iowa State University English Placement Test Corpus of Learner Writing under the different prompting conditions; its ratings were then compared with those of human raters. Compared to human raters, ChatGPT’s reliability was moderate to low. This was the case in both prompting conditions. In addition, a qualitative analysis of ChatGPT’s scoring rationales suggested that, unlike human raters, ChatGPT was limited in detecting content- related issues and integrating source text information. The findings of the study suggest that a more rigorous process may be required to train ChatGPT to rate similarly to human raters.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

ChatGPT for Writing Evaluation: Examining the Accuracy and Reliability of AI-Generated Scores Compared to Human Raters

Abstract

Ähnliche Arbeiten

Autoren

Themen