Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Model-Powered Automated Assessment: A Systematic Review
24
Zitationen
1
Autoren
2025
Jahr
Abstract
This systematic review investigates 49 peer-reviewed studies on Large Language Model-Powered Automated Assessment (LLMPAA) published between 2018 and 2024. Following PRISMA guidelines, studies were selected from Web of Science, Scopus, IEEE, ACM Digital Library, and PubMed databases. The analysis shows that LLMPAA has been widely applied in reading comprehension, language education, and computer science, primarily using essay and short-answer formats. While models such as GPT-4 and fine-tuned BERT often exhibit high agreement with human raters (e.g., QWK = 0.99, r = 0.95), other studies report lower agreement (e.g., ICC = 0.45, r = 0.38). LLMPAA offers benefits like efficiency, scalability, and personalized feedback. However, significant challenges remain, including bias, inconsistency, hallucination, limited explainability, dataset quality, and privacy concerns. These findings indicate that while LLMPAA technologies hold promise, their effectiveness varies by context. Human oversight is essential to ensure fair and reliable assessment outcomes.