Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Quality Evaluation of LLM-based Generated Test Case: A Literature Review

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) are increasingly applied in automated test case generation. Unlike traditional testing, quality evaluation of their outputs requires novel approaches due to the non-deterministic nature of LLMs and the risks of bias and hallucination. This study conducts a systematic literature review of 25 publications (2020–2025) to examine quality criteria, evaluation metrics, and heuristics in LLM -based test case generation. Findings show that correctness/accuracy and coverage metrics remain dominant, yet there is a growing focus on bias and hallucination detection. Effective heuristics include human-in-the-loop evaluation, gold standard comparison, and multi-agent frameworks. The review highlights emerging frameworks integrating ethical and safety criteria, underscoring the need for comprehensive evaluation to ensure reliability and fairness in automated test generation.

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationEthics and Social Impacts of AI

Volltext beim Verlag öffnen

Quality Evaluation of LLM-based Generated Test Case: A Literature Review

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen