OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 15:48

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT for Writing Evaluation: Examining the Accuracy and Reliability of AI-Generated Scores Compared to Human Raters

2024·12 Zitationen
Volltext beim Verlag öffnen

12

Zitationen

7

Autoren

2024

Jahr

Abstract

ChatGPT has proven beneficial in a variety of educational contexts, yet its effectiveness in scoring integrated second-language writing tasks remains uncertain. This study, therefore, explores the accuracy and reliability of ChatGPT-generated scores versus human ratings under two prompting conditions (with or without writing prompt and source texts) and examines the reasons behind rating discrepancies using a mixed methods approach. ChatGPT rated 74 argumentative essays from the Iowa State University English Placement Test Corpus of Learner Writing under the different prompting conditions; its ratings were then compared with those of human raters. Compared to human raters, ChatGPT’s reliability was moderate to low. This was the case in both prompting conditions. In addition, a qualitative analysis of ChatGPT’s scoring rationales suggested that, unlike human raters, ChatGPT was limited in detecting content- related issues and integrating source text information. The findings of the study suggest that a more rigorous process may be required to train ChatGPT to rate similarly to human raters.

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen