Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT for Writing Evaluation: Examining the Accuracy and Reliability of AI-Generated Scores Compared to Human Raters
12
Zitationen
7
Autoren
2024
Jahr
Abstract
ChatGPT has proven beneficial in a variety of educational contexts, yet its effectiveness in scoring integrated second-language writing tasks remains uncertain. This study, therefore, explores the accuracy and reliability of ChatGPT-generated scores versus human ratings under two prompting conditions (with or without writing prompt and source texts) and examines the reasons behind rating discrepancies using a mixed methods approach. ChatGPT rated 74 argumentative essays from the Iowa State University English Placement Test Corpus of Learner Writing under the different prompting conditions; its ratings were then compared with those of human raters. Compared to human raters, ChatGPT’s reliability was moderate to low. This was the case in both prompting conditions. In addition, a qualitative analysis of ChatGPT’s scoring rationales suggested that, unlike human raters, ChatGPT was limited in detecting content- related issues and integrating source text information. The findings of the study suggest that a more rigorous process may be required to train ChatGPT to rate similarly to human raters.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.