Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Don’t Just Clean It, Proxy Clean It: Mitigating Bias by Proxy in Pre-Trained Models

2022·2 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2022

Jahr

Abstract

Transformer-based pre-trained models are known to encode societal biases not only in their contextual representations, but also in downstream predictions when fine-tuned on task-specific data.We present D-Bias, an approach that selectively eliminates stereotypical associations (e.g, co-occurrence statistics) at fine-tuning, such that the model doesn’t learn to excessively rely on those signals.D-Bias attenuates biases from both identity words and frequently co-occurring proxies, which we select using pointwise mutual information.We apply D-Bias to a) occupation classification, and b) toxicity classification and find that our approach substantially reduces downstream biases (e.g. by > 60% in toxicity classification, for identities that are most frequently flagged as toxic on online platforms).In addition, we show that D-Bias dramatically improves upon scrubbing, i.e., removing only the identity words in question.We also demonstrate that D-Bias easily extends to multiple identities, and achieves competitive performance with two recently proposed debiasing approaches: R-LACE and INLP.

Autoren

Institutionen

Oracle (United States)(US)

Themen

Adversarial Robustness in Machine LearningArtificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Don’t Just Clean It, Proxy Clean It: Mitigating Bias by Proxy in Pre-Trained Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen