Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Complex social behaviors, such as empathy and strategic politeness, are widely assumed to resist the directional decomposition that makes activation steering effective for coarse attributes like sentiment or toxicity. We present STAR: Steering via Attribution and Representation, which tests this assumption by using attribution patching to identify the layer--token positions where each behavioral trait causally originates, then injecting contrastive activation vectors at precisely those locations. Evaluated on emotional dialogue and negotiation in both single- and multi-turn settings, localized injection consistently outperforms global steering and instruction priming; human evaluation confirms that gains reflect genuine improvements in perceived quality rather than lexical surface change. Our results suggest that complex interpersonal behaviors are encoded as localized, approximately linear directions in LLM activation space, and that behavioral alignment is fundamentally a localization problem.
Ähnliche Arbeiten
The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods
2009 · 5.678 Zit.
The Stress Process
1981 · 4.447 Zit.
Mental health problems and social media exposure during COVID-19 outbreak
2020 · 2.789 Zit.
Psychological Aspects of Natural Language Use: Our Words, Our Selves
2002 · 2.545 Zit.
Emotion: A Psychoevolutionary Synthesis
1980 · 2.523 Zit.