Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation Strategies for LLM-Based Models in Exercise and Health Coaching: A Scoping Review (Preprint)

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Large language model (LLM)-based AI coaches show promise for personalized exercise and health interventions. Their complex capabilities (communication, planning, movement analysis, monitoring) necessitate rigorous, multidimensional evaluation, but standardized frameworks are lacking. </sec> <sec> <title>OBJECTIVE</title> This scoping review systematically maps current evaluation strategies for LLM-based AI coaches in exercise and health, identifies strengths and limitations, and proposes future directions for robust, standardized validation. </sec> <sec> <title>METHODS</title> Following PRISMA-ScR guidelines, we systematically searched six databases using keywords for LLMs, exercise/health coaching, and evaluation. Studies describing LLM-based coaching systems with reported performance evaluation methods were included. Data on models, applications, evaluation strategies, and outcomes were charted. </sec> <sec> <title>RESULTS</title> Seventeen studies published between March 2023 and March 2025 met the inclusion criteria. Most utilized proprietary models (e.g., GPT-4), while some used open-source or custom models. Six studies incorporated multimodal inputs (video, sensor data). Evaluation strategies were highly heterogeneous, including quantitative metrics (Accuracy, F1, MAE), empirical methods (user studies, expert comparisons), and expert/user-centered feedback (expert scores [Kappa ≈ 0.79–0.82], user surveys [MITI, SASSI]). However, evaluations often lacked real-world testing, longitudinal assessment, and standardized benchmarks. </sec> <sec> <title>CONCLUSIONS</title> Evaluating LLM-based exercise and health coaches requires multifaceted strategies: quantitative metrics for objective tasks, empirical validation for user interaction, and expert assessment for personalization and safety. Current evaluations are fragmented, lacking standardization, ecological validity, and longitudinal assessment. Future progress demands robust, multidimensional frameworks emphasizing real-world validation, integrating RAG for accuracy, and developing specialized, efficient multimodal models or agents for reliable and scalable AI coaching. </sec>

Autoren

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationDelphi Technique in Research

Volltext beim Verlag öffnen

Evaluation Strategies for LLM-Based Models in Exercise and Health Coaching: A Scoping Review (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen