[NAACL 2021] Better Feature Integration for Named Entity Recognition (In NAACL 2021)
Python 3.7
Pytorch 1.4.0
Transformers 3.3.1
CUDA 10.1, 10.2
Firstly, download the embedding files: glove.6B.100d.txt , cc.ca.300.vec, cc.es.300.vec, cc.zh.300.vec, and put the files in the data folder.
By default, the model eval our saved model (without BERT) on SemEval 2010 Task 1 Spanish dataset.
python main.py
To train the model with other datasets:
python main.py --mode=train --dataset=ontonotes --embedding_file=glove.6B.100d.txt
To train with BERT, first obtain the contextual embedding with the instructions in the get_context_emb folder (The contextual embedding files for OntoNotes Engligh can be downloaded from here.), and then run with the command:
python main.py --mode=train --dataset=ontonotes --embedding_file=glove.6B.100d.txt --context_emb=bert
Note that the flag --dep_model=dggcn (by default) is where we call both GCN and our Syn-LSTM model. The flag --num_lstm-layer is designed for running some baselines, and should be set to 0 (by default) when running our proposed model.
Note that we use the data from 4 columns: word, dependency head index, dependency relation label, and entity label.
The code are created based on the code of the paper "Dependency-Guided LSTM-CRF Model for Named Entity Recognition", EMNLP 2019.