Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Big Data Versus Big GPU: Evolving Requirements and Governance Dynamics of AI Training Data
2
Zitationen
3
Autoren
2025
Jahr
Abstract
Abstract Pre-trained large language models (LLMs), epitomized by ChatGPT, have leveraged a cornucopia of “big data” to attain substantial leaps in artificial intelligence (AI). Whereas the diminishing returns from pre-training and the depletion of available training data have become evident, the post-training scaling law bolstered by “big GPU” has surfaced as an overriding strategy. Since 2024, post-trained models exemplified by o1 and DeepSeek-R1 have been widely acclaimed as successes in logic-intensive fields like advanced scientific problem-solving, serving as a bellwether for artificial general intelligence (AGI). Driven by the two cardinal elements of computing power and task-specific datasets, the data training processes of post-trained models exhibit more erratic and uncontrollable tendencies, which may be a menace to core societal domains and precipitate systemic friction vis-à-vis the existing data governance derived from pre-trained models. At this watershed moment, this article aims to conduct a comprehensive comparison of training data paradigms between pre-trained and post-trained models and to further develop cogent and favorable governance responses to mitigate emerging risks. Consequently, data security must be established as a prerequisite for AI development, and a lifecycle-based governance framework for AI training data in blended models can be introduced in the metamorphosis toward “bigger AI models”.
Ähnliche Arbeiten
The global landscape of AI ethics guidelines
2019 · 4.634 Zit.
The Limitations of Deep Learning in Adversarial Settings
2016 · 3.876 Zit.
Trust in Automation: Designing for Appropriate Reliance
2004 · 3.448 Zit.
Fairness through awareness
2012 · 3.294 Zit.
Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer
1987 · 3.184 Zit.