Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Generative AI and Human Performance in Question-Answer Validation Tasks

2025·0 Zitationen·Procedia Computer ScienceOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The development of educational applications, particularly within virtual reality (VR) environments, depends on the creation of high-quality question–answer pairs to support effective learning and assessment. Ensuring the accuracy and relevance of these items is essential, making systematic validation a necessary part of the design process. This study examines the effectiveness of generative artificial intelligence (AI) models, specifically ChatGPT and DeepSeek, in validating such content by comparing their performance with that of human evaluators. The goal is to identify the most consistent and reliable approach to supporting question generation in educational contexts. To measure inter-rater reliability, Krippendorf’s Alpha was used to assess consistency among multiple evaluators while accounting for agreement occurring by chance. Cohen’s Kappa was also applied to analyze pairwise agreement and to evaluate the extent to which human and AI assessments aligned. These metrics were calculated in several dimensions, including correctness, clarity, relevance, and educational value, to facilitate a comparative analysis of the evaluator’s performance.

Autoren

Institutionen

Alexandru Ioan Cuza University(RO)

Themen

Intelligent Tutoring Systems and Adaptive LearningArtificial Intelligence in Healthcare and EducationTopic Modeling

Volltext beim Verlag öffnen

Evaluating Generative AI and Human Performance in Question-Answer Validation Tasks

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen