OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 22:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset

2026·0 Zitationen·ACM Transactions on Software Engineering and Methodology
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

The Large Language Models (LLMs) have demonstrated great potential in code-related tasks. However, most research focuses on improving the output quality of LLMs (e.g., correctness), and less attention has been paid to the LLM input (e.g., the training code quality). Given that code smells are widely existed in practice and can negatively impact software maintainability and readability, this study takes the first systematic research to assess and improve dataset quality in terms of code smells. In this work, we first conduct a preliminary study to explore the presence of code smells in a popular benchmark dataset (i.e., CodeSearchNet -Python) and evaluate the output of several popular LLMs (i.e., DeepSeek-Coder, CodeLlama, and MagiCoder), revealing that code smell issues extensively exist in LLM’s input (e.g., benchmark dataset) and output (e.g., generated code). We also perform a user study to investigate developers’ perspectives on LLM-generated code with and without smells, which indicated developers’ strong preference for smell-free code and their willingness to leverage LLMs for intricate code smell removal. We then conduct our systematic research by taking three main steps: Firstly, we propose an LLM-based code smell cleaning tool, named Smell CC ( Smell C ode C leaner), which automatically refactors and removes code smells. To evaluate the correctness of the code refactoring, we construct a test set of 50 repositories sourced from the CodeSearchNet -Python benchmark for functional testing. Then we apply our curated smell-cleaned dataset to fine-tune two LLMs (i.e., DeepSeek-V2 and Qwen-Coder) to explore their potential for generating high-quality code. Thirdly, we investigate the impact of code smells on two downstream tasks: code completion and code search. Furthermore, to access the generalizability of Smell CC , we conduct a cross-project evaluation. Lastly, we derive several actionable implications for software engineering researchers and industry practitioners from our findings. The experimental results show that our Smell CC eliminates 91.6% of code smells across the entire CodeSearchNet -Python corpus, curating a smell-cleaned benchmark dataset. On a curated 50-repository subset, Smell CC achieves 96.8% smell removal while maintaining 91.3% correctness through test verification. The LLMs fine-tuned on this smell-cleaned dataset reduce code smells in generated code by 79.6% and 83.1% for DeepSeek-V2 and Qwen-Coder respectively. Moreover, applying the smell-cleaned dataset to code completion and code search tasks yields significant improvements across all models (DeepSeek-V1/V2, Qwen-Coder), with Qwen-Coder achieving peak gains of 12.2% in completion and 4.3% in search performance. Finally, Smell CC also demonstrates strong generalization capabilities in a cross-project setting.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Software Engineering ResearchArtificial Intelligence in Healthcare and EducationScientific Computing and Data Management
Volltext beim Verlag öffnen