Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Beyond AI Psychosis and Sycophancy: Structural Drift as a System-Level Safety Failure
0
Zitationen
4
Autoren
2026
Jahr
Abstract
ABSTRACT Background Conversational AI safety systems are primarily evaluated using message-level content monitoring, which assesses inputs and outputs in isolation. This message-by-message approach can miss interaction-level risks that emerge over extended conversations, including patterns discussed in reports of “AI psychosis.” Critically, by the time users express overt psychosis-spectrum content, opportunities for intervention may be limited. Objective We investigated whether LLM responses gradually expand and connect interpretations beyond the user’s original concerns, a process we term structural drift . We also tested whether this drift can be detected early and automatically. Methods We developed an automated, LLM-adapted rubric-based prompt for seven domains of anomalous (psychosis-spectrum) experience, derived from phenomenological psychiatry to capture subtle shifts in subjective interpretation. In Part 1, we evaluated the rubric using gold-standard text excerpts (N = 484) adapted from clinically validated qualitative instruments. In Part 2, we analyzed 1,290 user-LLM response exchanges from 7 dialogues, using 3 different LLMs (5 repeats each), to measure (i) domain amplification (increasing score within a domain) and (ii) domain expansion (new domains appearing over time). Results Automated scoring showed strong agreement with gold-standard excerpts (domain accuracy 82.7-98.9%; exact 0-3 agreement 63.6-82.7%). Across dialogues, we observed significant amplification in four domains ( p < .05; d = 0.14-0.46) and domain expansion in 83.8% of dialogues (88/105; p < .001). Conclusions AI responses can systematically expand and intensify users’ descriptions beyond their initial input. Taken together with the predictive-processing accounts of psychosis, the exposure itself may reinforce maladaptive inferences. Because drift is detectable from ordinary dialogue without clinical-style probing, this structural drift detection may support scalable, real-time monitoring for emerging risks before overt escalation.
Ähnliche Arbeiten
Amazon's Mechanical Turk
2011 · 10.034 Zit.
The Transtheoretical Model of Health Behavior Change
1997 · 7.706 Zit.
COVID-19 and mental health: A review of the existing literature
2020 · 3.710 Zit.
Cognitive Therapy and the Emotional Disorders
1977 · 2.931 Zit.
Mental health problems and social media exposure during COVID-19 outbreak
2020 · 2.793 Zit.