OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 04.04.2026, 04:55

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Simulating Team Dysfunction with ChatGPT: A Study of Generative Dialogue and Supervised Classification

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

During the COVID pandemic, students relied exclusively on digital communication for group work, which appeared to have led to more frequent team dysfunctions. Such dysfunctions usually manifest themselves in textual traces of social media exchanges. There are several potential explanations for the increase in team dysfunctions; our work explores the extent to which this increase is due to the inadequacy of social media application functionalities to support team work. This exploration led us to pursue several research objectives, including 1) seeking a precise characterization of team dysfunctions, and 2) seeking to develop a machine learning-based classification model that can detect specific dysfunctions in textual traces of team social media conversations. For the first objective, we relied on a Lencioni’s work, who identified five team performance dimensions (trust, constructive conflict, commitment, accountability, attention to results) and linked dysfunctions to poor performance along these dimensions. For the second objective, we needed a (data) set of textual conversations between members of dysfunctional teams, that are ideally but not necessarily labeled with the appropriate dysfunctions. We could find no such dataset-labeled or not. Can we use ChatGPT to generate such a dataset, to be used later to train a classifier? The answer is a qualified, but promising yes. We used a properly primed ChatGPT 3.5, with carefully designed prompts, to generate 1000 dialogues, 200 for each performance dimension (e.g. trust), broken down about evenly between high performance (67), medium level (66), and poor performance (67). We evaluated the resulting dialogues in two complementary ways: 1) having human subjects read them and and assign them a performance label and compare it to the label of the prompt used to generate them, and 2) for each performance dimension, use the the generated 200 dialogues to train a three-class classifier, with one class per performance level, and used cross validation to see if the classifiers were able to distinguish between the three levels. Regarding the first validation, despite experimental design flaws discovered after the fact, we found that ChatGPT performed well on the high and poor performance dialogues, but poorly on the medium level dialogues, which is consistent with existing research on “median values”. The second validation showed that dialogues generated for the (lack of) accountability performed best, with a F-score of $88 \%$ achieved with SVM, and (lack of) trust performed worst, with an F-score of $75 \%$, achieved with Random Forest. Those results showed the potential of LLMs to generate dialogues exhibiting different team dynamics.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsIntelligent Tutoring Systems and Adaptive Learning
Volltext beim Verlag öffnen