This is a model for semantic dependency parsing based on the syntax-agnostic model of Marcheggiani et al (2017). See https://arxiv.org/abs/1701.02593 and https://github.com/diegma/neural-dep-srl. This model achieves state-of-the-art results by a) using a better regularized LSTM with highway layers, as in He et al (20017), and b) incorporating supertags extracted from dependency trees.
- Python 2.7
- TensorFlow 1.0
You need your own copy of the CoNLL-2009 dataset to train the model. Here are the steps to prepare the data for a given language (here, English):
- From the
tag_srl
directory, create the directoriesdata/eng/conll09/
("eng" for English). - Add the CoNLL text files to the directory and name them
train.txt
,dev.txt
,test.txt
, andood.txt
, if there is an out-of-domain dataset. - Run
./scripts/preprocess.sh eng
to extract supertags from the data and generate vocab files. - You also need a copy of the pre-trained word embeddings used in the paper. Download the
sskip.100.vectors
file from https://github.com/diegma/neural-dep-srl and put it indata/eng/embeddings/sskip.100.vectors
. (Instructions for other languages forthcoming.) - To replicate our results, you really need to use predicted predicates and supertags and put them in
data/eng/conll09/pred/
with namesdev_predicates.txt
,dev_stags_model1.txt
, etc. a. We used mate-tools to get predicted predicates. You need to download the SRL pipeline (srl-4.31.tgz
) and the English model file (linked from here). b. Contact me to get predicted supertags.
Once you have all of the code, you can train a model by running (from the root directory)
python model/train.py
See python model/train.py --help
for all of the possible options. The default arguments are all the same as the best hyperparemeters from Marcheggiani et al (2017), but I found that the model performs even better if you add dropout between layers of the bidirectional LSTM (python model/train.py --dropout 0.5
).
Training on one Nvidia Tesla K80 GPU, with a batch size of 100, the model took around 13 minutes per epoch, and our best models converged after 3-6 hours of training.
Once you've trained a model, the trained model will be saved in a directory in output/models/
. To test the model, run
python model/test.py output/models/model_name test_split
test_split
should be dev, test, ood (out-of-domain), or train.
The code for this project is organized as follows:
model/train.py
parses command line arguments, resolves filepaths, and then callsutils/vocab.py
to get vocabulary objects for words, parts of speech, etc. AVocab
object has methods for mapping words to integer ids and back.- Then
model/train.py
callsmodel/srl.py
to initialize theSRL_Model
object.SRL_Model
builds a neural network using generic neural network functions frommodel/layers.py
andmodel/lstm.py
. model/srl.py
uses functions fromutil/conll_io.py
andutil/data_loader.py
to load CoNLL sentences into the model. a.util/conll_io.py
defines methods for reading and writing CoNLL-09 format sentences and storing them in a convenient format, along with predicted predicates and supertags. b.util/data_loader.py
converts lists of CoNLL sentences to batches of integer ids that can be fed to the neural network.- The
SRL_Model
also has methods for running a training or testing epoch. In a testing epoch, the model writes its predictions to a file (usuallyoutput/predictions/{model_name}.txt
), in CoNLL format, so it can be evaluated with the CoNLL evaluation script. - After each epoch,
model/train.py
calls the CoNLL-provided perl evaluation script (from a python wrapper ineval/eval.py
) and decides whether or not to stop early. model/test.py
just rebuilds a trained model and runs a testing epoch with the specified data.- The predicate disambiguation directory,
model/disamb/
, is organized in the same way, with one file,model/disamb/disamb.py
, to build the network and run training and testing epochs, and one file each for training and testing the model.