Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
GPT-3.5 for Data Augmentation in Automatic Essay Scoring: A Preliminary Analysis
0
Zitationen
7
Autoren
2025
Jahr
Abstract
Machine learning models are susceptible to the dataset used during its training. Dealing with limited or imbalanced datasets is challenging, and a commonly adopted approach to mitigate this limitation is data augmentation. For example, expanding the training set in a computer vision problem may involve rotation and resizing images; however, this task is more complex when dealing with textual data. This work investigates the use of GPT-3.5 for data augmentation in a dataset of argumentative essay texts from the National High School Exam (ENEM), which is used as a selection criterion for entry into public universities in Brazil. More specifically, we adopted traditional Natural Language Processing (NLP) techniques for essay scoring and compared the results with and without the data augmentation. Our results show that the long argumentative essays generated by GPT in the data augmentation process did not improve the performance of NLP models. Moreover, GPT could not adequately classify its synthetic data, suggesting poor quality of the generated data, and did not outperform NLP models in classifying real data.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.