Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Validity of AI-Generated School Tests : A Case Study in Mathematics and Biology
0
Zitationen
2
Autoren
2025
Jahr
Abstract
This paper evaluates the validity and classroom applicability of tests generated by large language models (LLMs) in mathematics and biology at the lower secondary level. Four AI-generated tests (two in mathematics, two in biology) were administered to 103 students across grades 5, 6, and 8. Item-level analysis (difficulty, discrimination index) and correlations with students’ final grades were used to examine psychometric quality. Results show that AI-generated tests, after minimal teacher corrections, reached validity potentially comparable to traditionally created assessments, with correlations ranging from moderate (r = 0.55) to strong (r = 0.83). Biology tests demonstrated acceptable discrimination indices but contained several items classified as "too easy." Comparisons between GPT-3.5, GPT-4o, and GPT-5 further illustrate the rapid improvement of multimodal and randomization capabilities. Findings suggest that AI can substantially reduce teachers’ workload while providing reliable test drafts, though expert review and difficulty calibration remain essential.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.