Skip to content

bouchardi/BERT_for_GAP-coreference

Repository files navigation

BERT_for_GAP-coreference

This project was realised in the context of the INF8225 AI course. In this project, we aim to reduce gender bias in pronoun resolution by creating a coreference resolver that performs well on a gender-balanced pronoun dataset, the Gendered Ambiguous Pronouns (GAP) dataset. We leverage BERT's strong pre-training tasks on large unsupervised datasets and transfer these contextual representations to the fine-tuning stage. The fine-tuning stage was trained in a SWAG-like manner on the GAP supervised dataset.

We have submitted our best performing model to the Gendered Pronoun Resolution Kaggle competition.

Setting up

git clone --recursive git@github.com:isabellebouchard/BERT_for_GAP-coreference.git

Make sure the submodules are properly initialized.

First steps

To run the code, first install Docker to be able to build and run a docker container with all the proper dependencies installed

docker build -t IMAGE_NAME .
nvidia-docker run --rm -it -v /path/to/your/code/:/project IMAGE_NAME

If you don't have access to GPU, change nvidia-docker for docker. It is highly recommended to run the training on (multiple) GPUs.

Once inside the container you should be able run the training script:

python run_GAP.py --data_dir gap-coreference \
                  --bert_model bert-base-cased \
                  --output_dir results \

This will run the training script and save checkpoints of the best model in the output directory.

About

BERT finetuning for GAP unbiased pronoun resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published