OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 09:47

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises

2025·0 Zitationen·Proceedings of the International Conference on Information Systems DevelopmentOpen Access
Volltext beim Verlag öffnen

0

Zitationen

11

Autoren

2025

Jahr

Abstract

Large language models (LLMs) are transforming programming education by enabling automated generation and evaluation of coding exercises. While previous studies have evaluated LLMs’ capabilities in one of these tasks, none have explored their effectiveness in solving programming exercises generated by other LLMs. This paper fills that gap by examining how state-of-the-art LLMs—ChatGPT, DeepSeek, Qwen, and Gemini—perform when solving exercises generated by different LLMs. Our study introduces a novel evaluation methodology featuring a structured prompt engineering strategy for generating and executing programming exercises in three widely used programming languages: Python, Java, and JavaScript. The results have both practical and theoretical value. Practically, they help identify which models are more effective at generating and solving exercises produced by LLMs. Theoretically, the study contributes to understanding the role of LLMs as collaborators in creating educational programming content.

Ähnliche Arbeiten