OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 01:33

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Reliability of using ChatGPT in Rating EFL Writings

2024·2 Zitationen·Shanlax International Journal of EducationOpen Access
Volltext beim Verlag öffnen

2

Zitationen

1

Autoren

2024

Jahr

Abstract

This paper explores the reliability of using ChatGPT in evaluating EFL writing by assessing its intra- and inter-rater reliability. Eighty-two compositions were randomly sampled from the Written English Corpus of Chinese Learners. These compositions were rated by three experienced raters with regard to ‘language’, ‘content’, and ‘organization’. The writing samples were also rated by ChatGPT twice over some time, and the average scores were calculated. Independent samples t-test was conducted to compare the average scores given by ChatGPT and human raters. Pearson correlation analyses were conducted between the two sets of overall scores given by ChatGPT to calculate the intra-rater reliability, as well as between average scores given by ChatGPT and human raters for inter-rater reliability. The results of comparative analysis shows that ChatGPT may be used for evaluating EFL essays, as the scores are similar to those provided by reliable human raters. However, the result of correlation analyses shows that the intra-rater reliability of ChatGPT is not h igh enough to be acceptable, r=0.575, p<0.01 and the strength of the inter-rater reliability is moderate as well, r=0.508, p<0.01. Besides, there is no significant relationship between their average scores on ‘organization’ of the writings, r=0.181, p>0.05. Thus, it can be concluded that ChatGPT is not a reliable tool to rate and score EFL writings using the prompt in this study. One of the possible reasons for the unreliability of ChatGPT as a rater of EFL writing seems to be related to scoring for the ‘organization’ of the essay. These findings imply that while ChatGPT has potential as an evaluative tool, its current limitations, particularly in assessing organization, must be addressed before it can be reliably used in educational settings.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen