This repository contains the code for reproducing the experiments in the paper "Improving Generalization in Coreference Resolution via Adversarial Training" by Sanjay Subramanian and Dan Roth, published at *SEM 2019.
This code was tested using Python 2.7 and Ubuntu 16.04. The requirements.txt co
ntains the packages and corresponding versions of the Python environment used fo
r running this code. Please follow the Getting Started
instructions in https://github.com/kentonl/e2e-coref
to download necessary files (e.g. word embeddings). You will also need to download the chec
kpoint for the Lee et al. 2018 model and insert the corresponding path in the lee2018_log_root field in experiments_adv.conf
. git-lfs was used to store the adv_checkpoint.zip
file, so you may need git-lfs to clone the repository.
Make sure to set the paths in experiments_adv.conf
and replace_data.py
to be correct for your system. The allCountries.txt and countryInfo.txt files can be downloaded from geonames.org, and the last_names.txt file contains the last names from the 1990 census, which can be downloaded from https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last#.
First, unzip the adv_checkpoint.zip
file to yield the adv_checkpoint directory. To reproduce the results in the paper, please run prepare_data.sh
and subsequently run run_experiments.sh
when the repository is the working directory. Please note that by default the prepare_data.sh script loads the state of the random number generator that we used to generate replacement names to enable exact reproducibility of our results. If you would like generate replacement names at random, you need only comment out the relevant line in generate_noleakage.py
.
The results should match those in the paper: http://cogcomp.org/papers/SubramanianRo19.pdf .
Much of the code in this repository is from Kenton Lee's repository https://github.com/kentonl/e2e-coref or is adapted from code in that repository. That code was distributed under an Apache 2.0 license. The firstname-gender-score.txt
gazetteer was provided by Sihao Chen.
If you use this work in your research, please cite our paper:
@inproceedings{SubramanianRo19,
author = {Sanjay Subramanian and Dan Roth},
title = {{Improving Generalization in Coreference Resolution via Adversarial Training}},
booktitle = {Proc. of the Joint Conference on Lexical and Computational Sematics},
month = {6},
year = {2019},
url = "http://cogcomp.org/papers/SubramanianRo19.pdf",
}