Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Automated Data Quality Gates for AI Training Pipelines
0
Zitationen
1
Autoren
2025
Jahr
Abstract
Aim: The aim of this study is to design and evaluate automated data quality gates integrated within AI training pipelines in order to improve data validation efficiency and model accuracy. A further objective is to automate the data validation process, which has traditionally relied on manual tests or slow, error-prone rule-checking applications that are difficult to extend. Methods: The proposed automated data quality gate system was implemented and tested using multiple open-source datasets. Its performance was evaluated by assessing model accuracy, error rates, and the time required for data validation within AI training pipelines. Results: The findings show that automated data quality gates reduce the likelihood of errors introduced during the training process and improve model accuracy by 10 - 30%. The system also reduces model error rates by 10 - 20% and significantly accelerates data validation, decreasing validation time from approximately 5 hours to less than 1 hour. Overall, the system demonstrated the ability to improve model performance by 15 - 25% compared to regular training methods while ensuring that only high-quality data flows into AI pipelines. These improvements enhance the scalability and accuracy of AI systems across application domains. Conclusion: The study concludes that automated data quality gates are effective in improving data validation efficiency and model accuracy. By automating data validation, the proposed approach overcomes limitations associated with manual and rule-based validation methods, leading to more reliable and scalable AI training pipelines suitable for deployment in various industries. Recommendations: The study recommends the wider adoption of automated data quality gates across AI applications, particularly in critical sectors such as healthcare and finance.
Ähnliche Arbeiten
The REDCap consortium: Building an international community of software platform partners
2019 · 22.717 Zit.
The FAIR Guiding Principles for scientific data management and stewardship
2016 · 16.858 Zit.
Bayesian Data Analysis
1995 · 13.689 Zit.
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY
2002 · 8.396 Zit.
Business Intelligence and Analytics: From Big Data to Big Impact
2012 · 5.892 Zit.