Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Study on Machine Learning-Based Cost Estimation Models for AI Training Data Construction
0
Zitationen
2
Autoren
2026
Jahr
Abstract
This study proposes an explainable machine learning framework for estimating the total project cost (TPC) of AI training-data construction, where cost information is difficult to structure due to heterogeneous workflows and quality requirements. Using 386 public AI training-data projects conducted between 2020 and 2022, we derive 24 numerical predictors from standardized final reports and construct three input tracks: a baseline feature set, a principal component analysis (PCA)-enhanced set, and a factor analysis (FA)–enhanced set capturing latent cost structures. Four regression models (Ridge, Random Forest, XGBoost, and LightGBM) are evaluated using nested cross-validation. XGBoost achieves the best overall performance across all three tracks (Baseline, PCA-enhanced, and FA-enhanced). Among them, PCA-enhanced XGBoost attains the highest predictive accuracy (R2 = 0.868; RMSE = 1084.9; MAE = 746.9; MAPE = 0.358; pooled out-of-fold), while Baseline XGBoost yields the lowest MAE (731.4; R2 = 0.863). To support transparent decision-making, Shapley Additive exPlanations (SHAP)-based attribution and scenario-based sensitivity analyses are conducted. Results show that project scale and process-level unit costs are dominant cost-drivers, while cloud usage, expert participation, and de-identification requirements exhibit secondary effects. The proposed framework provides an interpretable, data-driven approach to cost information management and decision support for data-intensive AI projects.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.391 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.257 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.685 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.501 Zit.