An implementation of our AACL 2020 paper "Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training" and a new version of our ACL 2019 paper "Second-Order Semantic Dependency Parsing with End-to-End Neural Networks".
The code is based on the old version of SuPar
Comparing with original code, we use MST instead of Eisner for syntactic dependency parsing. Our code is also able to concatenate word, POS tags, char and BERT embeddings as token representations.
python
: 3.7.0pytorch
: 1.3.0transformers
: 2.1.1
The model is evaluated on the Stanford Dependency conversion (v3.3.0) of the English Penn Treebank with POS tags predicted by Stanford POS tagger.
For all datasets, we follow the conventional data splits:
- Train: 02-21 (39,832 sentences)
- Dev: 22 (1,700 sentences)
- Test: 23 (2,416 sentences)
MODEL | UAS | LAS | Speed (Sents/s) |
---|---|---|---|
Single1O + TAG + MST | 95.75 | 94.04 | 1123 |
Local1O + TAG + MST | 95.83 | 94.23 | 1150 |
Single2O + TAG + MST | 95.86 | 94.19 | 966 |
Local2O + TAG + MST | 95.98 | 94.34 | 1006 |
Local2O + MST (Best) | 96.12 | 94.47 | 1006 |
CRF2O (Best) (Zhang et al., 2020) | 96.14 | 94.49 | 400 |
Where 1O
represents first-order, 2O
reperesents second-order, Single
represents binary classification Local
represents head-selection. The results are averaged over 5 times, Best
represents the single test results based on best development performance. Punctuation is ignored in all evaluation metrics for PTB.
You can start the training, evaluation and prediction process by using subcommands registered in parser.cmds
.
To train a syntactic parser, run:
$ CUDA_VISIBLE_DEVICES=0 python3 -u run.py train --conf config/3iter_100binary_0init_ptb_full_tree_0.cfg
To train a semantic parser, you can modify the dataset split in the config file. Then set tree = False
and binary = True
. Moreover, based on the binary structure, you can train a Enhanced Universal Dependencies (EUD) parser as well. But for better training a EUD parser, please use MultilangStructureKD.
All the data files must follow the CoNLL-U format.
- Tensorflow version of semantic dependency parser: Second_Order_SDP.
- Pytorch version of enhanced universal dependencies parser: MultilangStructureKD.
- An application for Mean-Field Variational Inference to Sequence Labeling: AIN.
- The PyTorch Version of Biaffine Parser: parser.