OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.03.2026, 20:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

How accurately do large Language models interpret sport safeguarding principles: an evaluation using the International Olympic Committee framework

2026·0 Zitationen·BMC Sports Science Medicine and RehabilitationOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

Safeguarding in sport, defined as the prevention of abuse, harassment, and exploitation, has become a core ethical responsibility for sport organizations. With the growing influence of artificial intelligence, large language models are increasingly being used to interpret, summarize, and operationalize policy documents. However, their reasoning fidelity in value-laden contexts such as athlete protection remains uncertain. This document-based comparative study evaluated two advanced large language models, ChatGPT (OpenAI) and NotebookLM (Google), against twenty-five decision points derived from the 2024 International Olympic Committee Consensus on Interpersonal Violence and Safeguarding in Sport. Responses were independently scored by two evaluators across four dimensions: accuracy, ethical reasoning, applicability, and responsibility awareness. Quantitative and qualitative analyses assessed concordance with IOC recommendations, inter-rater reliability, and reasoning patterns. Structured PICO-style prompts were used to ensure standardized and comparable model interrogation across all decision points. ChatGPT achieved a higher mean composite score (3.84 ± 0.41) than NotebookLM (3.32 ± 0.77) (Wilcoxon Z = 3.41, p = 0.001, r = 0.68; n = 25 paired decision points), with full concordance in 21/25 (84.0%) versus 14/25 (56.0%) decision points. Inter-rater agreement was excellent (Cohen’s kappa = 0.89). ChatGPT demonstrated stronger procedural reasoning and clearer alignment with IOC-defined responsibilities, while NotebookLM emphasized empathy and cultural nuance. Both models underperformed in survivor-centred reasoning and outcome evaluation metrics. LLMs can partially reproduce the ethical and procedural logic embedded in sport safeguarding frameworks, but their reasoning remains incomplete without human oversight. Their integration into governance, education, and policy translation should follow clear ethical guardrails to prevent misinterpretation or amplification of harm. These systems may support policy drafting, education, and communication, yet they should function strictly as augmentative tools rather than independent decision-makers. To our knowledge, this is the first study to systematically benchmark LLM reasoning against an official international safeguarding framework in sport. Future research should evaluate interactive and multilingual applications to enhance contextual accuracy and equitable safeguarding outcomes.

Ähnliche Arbeiten