OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 13:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Human-AI Collaboration Supporting GPT-4o Achieving Human-Level User Feedback in Emotional Support Conversations: Integrative Modeling and Prompt Engineering Approaches (Preprint)

2024·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2024

Jahr

Abstract

<sec> <title>BACKGROUND</title> Emotional support plays a crucial role in enhancing social interactions, facilitating psychological interventions, and improving customer service outcomes by addressing individuals' emotional needs. The emergence of large language models (LLMs) holds promise for delivering emotional support on a large scale, but their effectiveness compared to human counselors is still not well understood. Evaluating and enhancing the emotional support capabilities of LLMs through targeted user-centered strategies is crucial for their successful real-world integration. </sec> <sec> <title>OBJECTIVE</title> This study aims to evaluate the emotional support capabilities of large language models (LLMs), specifically GPT-4o, and to introduce an integrative automatic evaluation framework centered on user-perceived feedback. The framework is designed to enhance LLM performance in emotional support conversations (ESCs) by identifying psycholinguistic clues as intrinsic evaluation metrics and leveraging a customized Chain-of-Thought (CoT) prompting framework. </sec> <sec> <title>METHODS</title> The study used a dataset of emotional support conversations from human counselors. An explanatory predictive model was developed using explainable artificial intelligence methods, following an integrative modeling paradigm rooted in computational social science. The model evaluated and interpreted user-perceived feedback scores for GPT-4o. Additionally, the study integrated Hill’s three-stage model of helping into a manually customized chain of thought prompting framework to systematically evaluate GPT-4o's performance in ESCs. </sec> <sec> <title>RESULTS</title> GPT-4o achieved high user-perceived feedback scores, demonstrating relative stability in its performance, but it still significantly trails behind human counselors overall (Cliff's Delta = 0.087, p &lt; 0.001). The evaluation framework, which identified 41 distinct linguistic clues related to emotional expression, social dynamics, cognitive processes, linguistic style, and decision-making stages, enhanced the understanding of both processes and outcomes in ESCs. Notably, GPT-4o's user-perceived feedback scores significantly improved with the use of manually customized Chain of Thought prompts (p &lt; 0.001, Cohen's d: 0.378), but showing no significant difference from the average performance of human counselors overall (p-adj: 0.47, Cliff's Delta: -0.014). However, thought prompts demonstrate a significant advantage in specific emotion categories such as fear (p: 0.002, Cliff's Delta: -0.23), sadness (p: 0.012, Cliff's Delta: -0.105), and break up with partner issues (p: 0.254, Cliff's Delta: -0.06). However, GPT-4o exhibited weaknesses in emotional understanding, cognitive complexity, language fluency, and handling extreme scenarios. </sec> <sec> <title>CONCLUSIONS</title> This study provides preliminary evidence of GPT-4o's emotional support capabilities and proposes a user-perceived feedback-centered integrative evaluation framework for ESCs. The findings suggest a cautiously optimistic outlook for the application of advanced large language models (LLMs) in emotional support services, although significant challenges remain, particularly in enhancing the depth of exploration in conversations and the personalization of language. The proposed framework encourages the integration of human expertise into LLMs, enhancing their efficacy and advancing the development of trustworthy AI-based emotional support services. </sec>

Ähnliche Arbeiten