Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises

2025·0 Zitationen·Proceedings of the International Conference on Information Systems DevelopmentOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) are transforming programming education by enabling automated generation and evaluation of coding exercises. While previous studies have evaluated LLMs’ capabilities in one of these tasks, none have explored their effectiveness in solving programming exercises generated by other LLMs. This paper fills that gap by examining how state-of-the-art LLMs—ChatGPT, DeepSeek, Qwen, and Gemini—perform when solving exercises generated by different LLMs. Our study introduces a novel evaluation methodology featuring a structured prompt engineering strategy for generating and executing programming exercises in three widely used programming languages: Python, Java, and JavaScript. The results have both practical and theoretical value. Practically, they help identify which models are more effective at generating and solving exercises produced by LLMs. Theoretically, the study contributes to understanding the role of LLMs as collaborators in creating educational programming content.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTeaching and Learning ProgrammingSoftware Engineering Research

Volltext beim Verlag öffnen

Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen