Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2
12
Zitationen
1
Autoren
2024
Jahr
Abstract
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability measures such as Cohen's weighted kappa and intraclass correlation. The results revealed a high agreement in means and substantial reliability between the two grading methods on the level of the majority of texts. However, individual discrepancies and outliers were also identified, underscoring the nuanced nature of grading. While ChatGPT demonstrated efficiency and general alignment with human grading, the study concludes that it should not replace human judgment, particularly due to these observed inconsistencies. The findings contribute valuable insights into the potential and limitations of AI in educational grading and emphasize the importance of a comprehensive quantitative evaluation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.