Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Corpus-Based Evaluation Models for Quality Assurance Of AI-Generated ESL Learning Materials
0
Zitationen
2
Autoren
2022
Jahr
Abstract
This study addresses the problem that AI-generated ESL learning materials can appear fluent yet vary in accuracy, level appropriateness, and coherence, weakening quality assurance for large-scale cloud and enterprise deployment. The purpose was to develop and validate a corpus-based evaluation model that links corpus indicators to stakeholder quality judgments. Using a quantitative cross-sectional, case-based design, N = 120 evaluators assessed M = 80 AI-generated texts across four categories (reading passages, dialogues, grammar explanations, and practice prompts) using a five-point Likert instrument. Key dependent variables were overall QA and subscales for accuracy, clarity, coherence, level appropriateness, and pedagogical usefulness; key independent variables were readability control index, lexical appropriacy score, cohesion score, lexical diversity (HD-D), and grammar error rate (errors per 100 words). Analyses used descriptive statistics, Cronbach’s alpha, Pearson correlations, and multiple regression with text-type stability checks. Overall perceived quality was acceptable (overall QA M = 3.84, SD = 0.53), with clarity highest (M = 3.96) and accuracy lowest (M = 3.72). Reliability was strong (overall α = .91). Corpus to human alignment was substantial: readability control correlated with level appropriateness (r = .61), cohesion with coherence (r = .58), lexical appropriacy with clarity (r = .52) and usefulness (r = .49), and grammar error rate with accuracy (r = −.67), all p < .001. A five-predictor regression model predicted overall QA (F (5,74) = 21.64, p < .001; R² = .59; Adj. R² = .56), with grammar error rate the strongest predictor (β = −.41), followed by readability (β = .29), cohesion (β = .24), and lexical appropriacy (β = .21); performance remained stable across text types (R² = .52–.61). Implications are that organizations can operationalize QA as automated gates for error density, readability bands, cohesion thresholds, and vocabulary profile alignment, then reserve human review for borderline cases to improve safety, consistency, and turnaround time in enterprise content workflows. Average indicators were overall readability 0.64, lexical appropriacy 0.71, cohesion 0.59, lexical diversity 0.82, and grammar error rate 2.40 per 100 words.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.