OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 05.04.2026, 02:00

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Study on Machine Learning-Based Cost Estimation Models for AI Training Data Construction

2026·0 Zitationen·Applied SciencesOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2026

Jahr

Abstract

This study proposes an explainable machine learning framework for estimating the total project cost (TPC) of AI training-data construction, where cost information is difficult to structure due to heterogeneous workflows and quality requirements. Using 386 public AI training-data projects conducted between 2020 and 2022, we derive 24 numerical predictors from standardized final reports and construct three input tracks: a baseline feature set, a principal component analysis (PCA)-enhanced set, and a factor analysis (FA)–enhanced set capturing latent cost structures. Four regression models (Ridge, Random Forest, XGBoost, and LightGBM) are evaluated using nested cross-validation. XGBoost achieves the best overall performance across all three tracks (Baseline, PCA-enhanced, and FA-enhanced). Among them, PCA-enhanced XGBoost attains the highest predictive accuracy (R2 = 0.868; RMSE = 1084.9; MAE = 746.9; MAPE = 0.358; pooled out-of-fold), while Baseline XGBoost yields the lowest MAE (731.4; R2 = 0.863). To support transparent decision-making, Shapley Additive exPlanations (SHAP)-based attribution and scenario-based sensitivity analyses are conducted. Results show that project scale and process-level unit costs are dominant cost-drivers, while cloud usage, expert participation, and de-identification requirements exhibit secondary effects. The proposed framework provides an interpretable, data-driven approach to cost information management and decision support for data-intensive AI projects.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics and Social Impacts of AIBig Data and Business Intelligence
Volltext beim Verlag öffnen