Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead

2026·0 Zitationen·ACM Transactions on Software Engineering and Methodology

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

The rapid integration of Large Language Models (LLMs) into software engineering (SE) has revolutionized tasks from code generation to program repair, producing a massive volume of software artifacts. This surge in automated creation has exposed a critical bottleneck: the lack of scalable and reliable methods to evaluate the quality of these outputs. Human evaluation, while effective, is very costly and time-consuming. Traditional automated metrics like BLEU rely on high-quality references and struggle to capture nuanced aspects of software quality, such as readability and usefulness. In response, the LLM-as-a-Judge paradigm, which employs LLMs for automated evaluation, has emerged. This approach leverages the advanced reasoning and coding capabilities of LLMs themselves to perform automated evaluations, offering a compelling path toward achieving both the nuance of human insight and the scalability of automated systems. Nevertheless, LLM-as-a-Judge research in the SE community is still in its early stages, with many breakthroughs needed. This forward-looking SE 2030 paper aims to steer the research community toward advancing LLM-as-a-Judge for evaluating LLM-generated software artifacts, while also sharing potential research paths to achieve this goal. We provide a literature review of existing SE studies on LLM-as-a-Judge and envision these frameworks as reliable, robust, and scalable human surrogates capable of evaluating software artifacts with consistent, multi-faceted assessments by 2030 and beyond. To validate this vision, we analyze the limitations of current studies, identify key research gaps, and outline a detailed roadmap to guide future developments of LLM-as-a-Judge in software engineering. While not intended to be a definitive guide, our work aims to foster further research and adoption of LLM-as-a-Judge frameworks within the SE community, ultimately improving the effectiveness and scalability of software artifact evaluation methods.

Autoren

Institutionen

Themen

Software Engineering ResearchSoftware Engineering Techniques and PracticesArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen