OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.03.2026, 20:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

BadInterpreter: Backdoor Attack on LLM-based Interpretable Recommendation

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

<title>Abstract</title> Large Language Models (LLMs) has promoted miscellaneous models and downstream applications, driving the progress of LLM agents by enhancing their ability to comprehend and generate interpretable reasoning. Recently, the security of LLM agents has become an increasingly popular research topic, where backdoor attacks show potential devastation by injecting a covert backdoor to manipulate the output. Our findings show that LLM agents fine-tuned for recommendation tasks are particularly vulnerable to the embedding of imperceptible backdoors, even when recommendation explanations are required. We introduce BadInterpreter, a simple yet effective backdoor attack for LLM-based interpretable recommendation systems, enabling attackers to manipulate product recommendations and explanations without altering ground-truth labels. In interpretable recommendation, LLM agents are asked to provide explanations for product recommendations to meet user needs. We propose a novel LLM-based pipeline to construct poisoned fine-tuning data, where the agent is expected to recommend the target product with rational recommendation explanations. Attacked by our BadInterpreter, LLM agents prioritize recommending the target products whose information contains attacker-designed triggers in a dynamic interactive environment, along with convincing explanations. Our attack consistently achieves robust attack success rates exceeding 94% on two benchmark e-shopping datasets with four distinct LLMs. While backdoor attacks represent a well-explored threat in natural language processing models, their application and impact within the specific context of LLM-based interpretable recommendation systems remain largely uncharted territory. To our knowledge, this study pioneers the investigation of such vulnerabilities in this critical domain. Our work reveals that constructing LLM-based recommendation systems on untrusted LLMs poses a severe threat.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Explainable Artificial Intelligence (XAI)Adversarial Robustness in Machine LearningArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen