OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.04.2026, 16:05

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Beyond AI Psychosis and Sycophancy: Structural Drift as a System-Level Safety Failure

2026·0 Zitationen·medRxivOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

ABSTRACT Background Conversational AI safety systems are primarily evaluated using message-level content monitoring, which assesses inputs and outputs in isolation. This message-by-message approach can miss interaction-level risks that emerge over extended conversations, including patterns discussed in reports of “AI psychosis.” Critically, by the time users express overt psychosis-spectrum content, opportunities for intervention may be limited. Objective We investigated whether LLM responses gradually expand and connect interpretations beyond the user’s original concerns, a process we term structural drift . We also tested whether this drift can be detected early and automatically. Methods We developed an automated, LLM-adapted rubric-based prompt for seven domains of anomalous (psychosis-spectrum) experience, derived from phenomenological psychiatry to capture subtle shifts in subjective interpretation. In Part 1, we evaluated the rubric using gold-standard text excerpts (N = 484) adapted from clinically validated qualitative instruments. In Part 2, we analyzed 1,290 user-LLM response exchanges from 7 dialogues, using 3 different LLMs (5 repeats each), to measure (i) domain amplification (increasing score within a domain) and (ii) domain expansion (new domains appearing over time). Results Automated scoring showed strong agreement with gold-standard excerpts (domain accuracy 82.7-98.9%; exact 0-3 agreement 63.6-82.7%). Across dialogues, we observed significant amplification in four domains ( p < .05; d = 0.14-0.46) and domain expansion in 83.8% of dialogues (88/105; p < .001). Conclusions AI responses can systematically expand and intensify users’ descriptions beyond their initial input. Taken together with the predictive-processing accounts of psychosis, the exposure itself may reinforce maladaptive inferences. Because drift is detectable from ordinary dialogue without clinical-style probing, this structural drift detection may support scalable, real-time monitoring for emerging risks before overt escalation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Digital Mental Health InterventionsPsychosomatic Disorders and Their TreatmentsArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen