OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 01.05.2026, 03:22

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Machine Learning-Assisted Code Generation: Exploring The Role Of Large Language Models In Automating Software Development Tasks

2026·0 Zitationen·International Journal of Advances in Signal and Image SciencesOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

This paper explores the capabilities of large language models (LLMs) to automate software development tasks in combination with human-in-the-loop evidence to quantitatively benchmark performance. Background: LLMs hold the promise of translating natural-language intent into code, but concerns remain over reliability, maintainability and trust in actual workflows. Procedure: We performed a mixed-methods evaluation of MBPP (Python) and a curated C++ dataset with GPT-4, Code Llama, and StarCoder, comparing Pass apartment at k accuracy, execution proficiency, and unit-goal coverage, and surveyed 21 items (n=100) on adoption, perceived productivity, belief, and experience effects. Results: GPT-4 had the highest Pass@5 and execution success both on the two datasets consistently; Code Llama was in the middle; StarCoder fell behind. Adoption is still patchy (51% had not used AI coding tools at all). The perceived gains in productivity were modest (median 3/5; mean 3.2/5), trust was concentrated at “Medium” (~38%), and those with 1-3 years of experience saw the largest benefits (mean approx. 3.58), indicating that LLMs are likely to be helpful to juniors on routine work but to seniors on complex design work. Recommendation: LLMs are optimal as a supplemental, facilitator of existing hybrid test-gated workflows; training should focus on problem development, trouble shooting, and AI critiquing. Disadvantages are the presence of only function-level benchmarks, Python/C++ scope, and self-reported survey data; future research must address industrial-scale repositories, other languages, longitudinal user studies, and iterative human-in-the-loop debugging, and improved levels of explainability and provenance controls.

Ähnliche Arbeiten