Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Validity of AI-Generated School Tests : A Case Study in Mathematics and Biology

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This paper evaluates the validity and classroom applicability of tests generated by large language models (LLMs) in mathematics and biology at the lower secondary level. Four AI-generated tests (two in mathematics, two in biology) were administered to 103 students across grades 5, 6, and 8. Item-level analysis (difficulty, discrimination index) and correlations with students’ final grades were used to examine psychometric quality. Results show that AI-generated tests, after minimal teacher corrections, reached validity potentially comparable to traditionally created assessments, with correlations ranging from moderate (r = 0.55) to strong (r = 0.83). Biology tests demonstrated acceptable discrimination indices but contained several items classified as "too easy." Comparisons between GPT-3.5, GPT-4o, and GPT-5 further illustrate the rapid improvement of multimodal and randomization capabilities. Findings suggest that AI can substantially reduce teachers’ workload while providing reliable test drafts, though expert review and difficulty calibration remain essential.

Autoren

Institutionen

University of Trnava(SK)

Themen

Artificial Intelligence in Healthcare and EducationIntelligent Tutoring Systems and Adaptive LearningExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Evaluating the Validity of AI-Generated School Tests : A Case Study in Mathematics and Biology

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen