Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Simulating Team Dysfunction with ChatGPT: A Study of Generative Dialogue and Supervised Classification
0
Zitationen
3
Autoren
2025
Jahr
Abstract
During the COVID pandemic, students relied exclusively on digital communication for group work, which appeared to have led to more frequent team dysfunctions. Such dysfunctions usually manifest themselves in textual traces of social media exchanges. There are several potential explanations for the increase in team dysfunctions; our work explores the extent to which this increase is due to the inadequacy of social media application functionalities to support team work. This exploration led us to pursue several research objectives, including 1) seeking a precise characterization of team dysfunctions, and 2) seeking to develop a machine learning-based classification model that can detect specific dysfunctions in textual traces of team social media conversations. For the first objective, we relied on a Lencioni’s work, who identified five team performance dimensions (trust, constructive conflict, commitment, accountability, attention to results) and linked dysfunctions to poor performance along these dimensions. For the second objective, we needed a (data) set of textual conversations between members of dysfunctional teams, that are ideally but not necessarily labeled with the appropriate dysfunctions. We could find no such dataset-labeled or not. Can we use ChatGPT to generate such a dataset, to be used later to train a classifier? The answer is a qualified, but promising yes. We used a properly primed ChatGPT 3.5, with carefully designed prompts, to generate 1000 dialogues, 200 for each performance dimension (e.g. trust), broken down about evenly between high performance (67), medium level (66), and poor performance (67). We evaluated the resulting dialogues in two complementary ways: 1) having human subjects read them and and assign them a performance label and compare it to the label of the prompt used to generate them, and 2) for each performance dimension, use the the generated 200 dialogues to train a three-class classifier, with one class per performance level, and used cross validation to see if the classifiers were able to distinguish between the three levels. Regarding the first validation, despite experimental design flaws discovered after the fact, we found that ChatGPT performed well on the high and poor performance dialogues, but poorly on the medium level dialogues, which is consistent with existing research on “median values”. The second validation showed that dialogues generated for the (lack of) accountability performed best, with a F-score of $88 \%$ achieved with SVM, and (lack of) trust performed worst, with an F-score of $75 \%$, achieved with Random Forest. Those results showed the potential of LLMs to generate dialogues exhibiting different team dynamics.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.380 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.243 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.671 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.496 Zit.