This project was realised in the context of the INF8225 AI course. In this project, we aim to reduce gender bias in pronoun resolution by creating a coreference resolver that performs well on a gender-balanced pronoun dataset, the Gendered Ambiguous Pronouns (GAP) dataset. We leverage BERT's strong pre-training tasks on large unsupervised datasets and transfer these contextual representations to the fine-tuning stage. The fine-tuning stage was trained in a SWAG-like manner on the GAP supervised dataset.
We have submitted our best performing model to the Gendered Pronoun Resolution Kaggle competition.
git clone --recursive git@github.com:isabellebouchard/BERT_for_GAP-coreference.git
Make sure the submodules are properly initialized.
To run the code, first install Docker to be able to build and run a docker container with all the proper dependencies installed
docker build -t IMAGE_NAME .
nvidia-docker run --rm -it -v /path/to/your/code/:/project IMAGE_NAME
If you don't have access to GPU, change nvidia-docker
for docker
. It is
highly recommended to run the training on (multiple) GPUs.
Once inside the container you should be able run the training script:
python run_GAP.py --data_dir gap-coreference \
--bert_model bert-base-cased \
--output_dir results \
This will run the training script and save checkpoints of the best model in the output directory.