Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring

2025·0 Zitationen·ArXiv.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study explored the utilities of rationales generated by GPT-4.1 and GPT-5 in automated scoring using Prompt 6 essays from the 2012 Kaggle ASAP data. Essay-based scoring was compared with rationale-based scoring. The study found in general essay-based scoring performed better than rationale-based scoring with higher Quadratic Weighted Kappa (QWK). However, rationale-based scoring led to higher scoring accuracy in terms of F1 scores for score 0 which had less representation due to class imbalance issues. The ensemble modeling of essay-based scoring models increased the scoring accuracy at both specific score levels and across all score levels. The ensemble modeling of essay-based scoring and each of the rationale-based scoring performed about the same. Further ensemble of essay-based scoring and both rationale-based scoring yielded the best scoring accuracy with QWK of 0.870 compared with 0.848 reported in literature.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationPsychometric Methodologies and TestingTopic Modeling

Volltext beim Verlag öffnen

Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring

Abstract

Ähnliche Arbeiten

Autoren

Themen