Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How accurately do large Language models interpret sport safeguarding principles: an evaluation using the International Olympic Committee framework
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Safeguarding in sport, defined as the prevention of abuse, harassment, and exploitation, has become a core ethical responsibility for sport organizations. With the growing influence of artificial intelligence, large language models are increasingly being used to interpret, summarize, and operationalize policy documents. However, their reasoning fidelity in value-laden contexts such as athlete protection remains uncertain. This document-based comparative study evaluated two advanced large language models, ChatGPT (OpenAI) and NotebookLM (Google), against twenty-five decision points derived from the 2024 International Olympic Committee Consensus on Interpersonal Violence and Safeguarding in Sport. Responses were independently scored by two evaluators across four dimensions: accuracy, ethical reasoning, applicability, and responsibility awareness. Quantitative and qualitative analyses assessed concordance with IOC recommendations, inter-rater reliability, and reasoning patterns. Structured PICO-style prompts were used to ensure standardized and comparable model interrogation across all decision points. ChatGPT achieved a higher mean composite score (3.84 ± 0.41) than NotebookLM (3.32 ± 0.77) (Wilcoxon Z = 3.41, p = 0.001, r = 0.68; n = 25 paired decision points), with full concordance in 21/25 (84.0%) versus 14/25 (56.0%) decision points. Inter-rater agreement was excellent (Cohen’s kappa = 0.89). ChatGPT demonstrated stronger procedural reasoning and clearer alignment with IOC-defined responsibilities, while NotebookLM emphasized empathy and cultural nuance. Both models underperformed in survivor-centred reasoning and outcome evaluation metrics. LLMs can partially reproduce the ethical and procedural logic embedded in sport safeguarding frameworks, but their reasoning remains incomplete without human oversight. Their integration into governance, education, and policy translation should follow clear ethical guardrails to prevent misinterpretation or amplification of harm. These systems may support policy drafting, education, and communication, yet they should function strictly as augmentative tools rather than independent decision-makers. To our knowledge, this is the first study to systematically benchmark LLM reasoning against an official international safeguarding framework in sport. Future research should evaluate interactive and multilingual applications to enhance contextual accuracy and equitable safeguarding outcomes.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.324 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.189 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.588 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.470 Zit.