Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study
22
Zitationen
6
Autoren
2024
Jahr
Abstract
Pre-trained Machine Learning (ML) models help to create ML-intensive systems without having to spend conspicuous resources on training a new model from the ground up. However, the lack of transparency for such models could lead to undesired consequences in terms of bias, fairness, trustworthiness of the underlying data, and, potentially even legal implications. Taking as a case study the transformer models hosted by Hugging Face, a popular hub for pre-trained ML models, this paper empirically investigates the transparency of pre-trained transformer models. We look at the extent to which model descriptions (i) specify the datasets being used for their pre-training, (ii) discuss their possible training bias, (iii) declare their license, and whether projects using such models take these licenses into account. Results indicate that pre-trained models still have a limited exposure of their training datasets, possible biases, and adopted licenses. Also, we found several cases of possible licensing violations by client projects. Our findings motivate further research to improve the transparency of ML models, which may result in the definition, generation, and adoption of Artificial Intelligence Bills of Materials.
Ähnliche Arbeiten
Eigenfaces for Recognition
1991 · 13.700 Zit.
Rectified Linear Units Improve Restricted Boltzmann Machines
2010 · 13.199 Zit.
Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
1997 · 11.702 Zit.
FaceNet: A unified embedding for face recognition and clustering
2015 · 10.851 Zit.
The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception
1997 · 7.865 Zit.