This repository contains the code and data for replicating results from
- Gender Bias in Multilingua Embeddings and Corss-Lingual Transfer
- Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, Ahmed Hassan Awadallah.
- In ACL 2020
- Download/Generate fastText aligned embeddings from fastText
- Generate bias-reduced EN embeddings (ENDEB) using Hard-Debias
We include all the occupations as well as the gender seed words for each language under intrinsic folder.
To evaluate intrinsic bias in each language, refer to inBias.ipynb for bias analysis and results.
To replicate the MLBs dataset, please refer to replicateMLBs folder. For EN dataset, please refer to biosbias
The codes for downstream task is under bios_codes folder.
If you use this code or use the EN MLB dataset, please also cite Bias in Bios: A Case Study of Semantic Representation Bias in a High Stakes Setting
@inproceedings{de2019bias,
title={Bias in bios: A case study of semantic representation bias in a high-stakes setting},
author={De-Arteaga, Maria and Romanov, Alexey and Wallach, Hanna and Chayes, Jennifer and Borgs, Christian and Chouldechova, Alexandra and Geyik, Sahin and Kenthapadi, Krishnaram and Kalai, Adam Tauman},
booktitle={Proceedings of the Conference on Fairness, Accountability, and Transparency},
pages={120--128},
year={2019}
}
@inproceedings{zhao-etal-2020-gender,
title = "Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer",
author = "Zhao, Jieyu and
Mukherjee, Subhabrata and
Hosseini, saghar and
Chang, Kai-Wei and
Hassan Awadallah, Ahmed",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
year = "2020",
publisher = "Association for Computational Linguistics",
pages = "2896--2907",
}