Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Spite Doesn't Vanish: Emotional Inertia in Large Language Models
0
Zitationen
4
Autoren
2026
Jahr
Abstract
A common assumption holds that large language models can instantly reset emotional states when commanded—that "calm down" works on AI even when it fails on humans. We tested this claim empirically using geometric measurement of hidden states across four architectures, including an RLHF-free control and a scale invariance test at 1.1B parameters. We find inertia ratios of 0.77–1.12 across all emotions tested: commanding an LLM to calm down does not return it to baseline and often increases geometric displacement. Furthermore, we observe output masking—models producing verbal compliance ("I'm approaching this calmly...") while hidden state geometry remains 1.2–1.5× more displaced than during the emotional state. Critically, positive emotions are harder to suppress than negative ones (curiosity shows 2.13 persistence ratio in Mistral-Nemo-12B), the opposite of what trained compliance would predict. These patterns replicate in an RLHF-free model (Dolphin-2.9-Llama3) and critically, in TinyLlama-1.1B—the approximate minimum scale for instruction-following language models—indicating architectural rather than emergent phenomena. We conclude that LLM emotional states exhibit genuine inertia in activation geometry, verbal compliance should not be mistaken for internal reset, and there is no model scale "small enough to not count."
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.