PyNER is a sequence labeling toolkit that allows researcher and developer to train/evaluate neural sequence labeling methods.
You can try pyner
on a local machine or a docker container.
- setup (If you do not install pipenv, please install)
poetry install
- train
# If a GPU is not available, specify `--device -1`
pipenv run python pyner/named_entity/train.py config/training/conll2003.lample.yaml --device 0
- build container
make build
- launch container
make start
- train
You have to execute this command in Docker container.
# If a GPU is not available, specify `--device -1`
python3 train.py config/training/conll2003.lample.yaml --device 0
This experiment uses CoNLL 2003 dataset. Please read the following "Prepare dataset" section.
We use a data format same as deep-crf.
$ head -n 15 data/processed/CoNLL2003_BIOES/train.txt
EU S-ORG
rejects O
German S-MISC
call O
to O
boycott O
British S-MISC
lamb O
. O
Peter B-PER
Blackburn E-PER
BRUSSELS S-LOC
1996-08-22 O
For reproducing results in Lample's paper, you have to do some step to prepare datasets.
We can't include CoNLL 2003 dataset in this repository due to legal limitation. Instead, PyNER offers the way to create dataset from CoNLL 2003 dataset
If you could prepare CoNLL 2003 dataset, you would have three files like below.
- eng.iob.testa
- eng.iob.testb
- eng.iob.train
Please put them to on same directoy (e.g. data/external/CoNLL2003
).
$ tree data/external/CoNLL2003
data/external/CoNLL2003
├── eng.iob.testa
├── eng.iob.testb
└── eng.iob.train
Then, you can create the dataset for pyner by following command.
After running the command, ./data/processed/CoNLL2003
will be generated for you.
$ python bin/parse_CoNLL2003.py \
--data-dir data/external/CoNLL2003 \
--output-dir data/processed/CoNLL2003 \
--convert-rule iob2bio
2019-09-24 23:43:39,299 INFO root :create dataset for CoNLL2003
2019-09-24 23:43:39,299 INFO root :create corpus parser
2019-09-24 23:43:39,300 INFO root :parsing corpus for training
2019-09-24 23:44:02,240 INFO root :parsing corpus for validating
2019-09-24 23:44:04,397 INFO root :parsing corpus for testing
2019-09-24 23:44:06,507 INFO root :Create train dataset
2019-09-24 23:44:06,705 INFO root :Create valid dataset
2019-09-24 23:44:06,755 INFO root :Create test dataset
2019-09-24 23:44:06,800 INFO root :Create vocabulary
$
$ tree data/processed/CoNLL2003
data/processed/CoNLL2003
├── test.txt
├── train.txt
├── valid.txt
├── vocab.chars.txt
├── vocab.tags.txt
└── vocab.words.txt
Using pre-trained word embeddings significantly improve the performance of NER.
Lample et al. also use pre-trained word embeddings.
They use Skip-N-Gram embeddings, which can be downloaded from Official repo's issue.
To use this, please run make get-lample
before running make build
.
(If you want to use GloVe embeddings, please run make get-glove
.)
$ make get-lample
rm -rf data/external/GloveEmbeddings
mkdir -p data/external/LampleEmbeddings
mkdir -p data/processed/LampleEmbeddings
python bin/fetch_lample_embedding.py
python bin/prepare_embeddings.py \
data/external/LampleEmbeddings/skipngram_100d.txt \
data/processed/LampleEmbeddings/skipngram_100d \
--format word2vec
saved model
$
$ ls -1 data/processed/LampleEmbeddings
skipngram_100d
skipngram_100d.vectors.npy
Congratulations! All preparation steps have done. Now you can train the Lample's LSTM-CRF. Please run the command:
- Local machine:
python3 pyner/named_entity/train.py config/training/conll2003.lample.yaml --device 0
- Docker container:
python3 train.py config/training/conll2003.lample.yaml --device 0
You can test your model using pyner/named_entity/inference.py
.
Only thing you have to pass to inference.py
is path to model dir.
Model dir is defined in config file (output).
$ cat config/training/conll2003.lample.yaml
iteration: "./config/iteration/long.yaml"
external: "./config/external/conll2003.yaml"
model: "./config/model/lample.yaml"
optimizer: "./config/optimizer/sgd_with_clipping.yaml"
preprocessing: "./config/preprocessing/znorm.yaml"
output: "./model/conll2003.lample" # model dir is here!!
If you successfully train the model, some files are generated on model/conll2003.lample.skipngram.YYYY-MM-DDTxx:xx:xx.xxxxxx
.
$ ls -1 model/conll2003.lample.skipngram.2019-09-24T07:02:33.536822
args
log
snapshot_epoch_0001
snapshot_epoch_0002
snapshot_epoch_0003
snapshot_epoch_0004
...
snapshot_epoch_0148
snapshot_epoch_0149
snapshot_epoch_0150
validation.main.fscore.epoch_031.pred # here!!
Running python3 pyner/named_entity/inference.py
will generate prediction results on model/conll2003.lample.skipngram.YYYY-MM-DDTxx:xx:xx.xxxxxx
A file name would be {metrics}.epoch_{xxx}.pred
.
inference.py
check a log and select a model which achieve most high f1 score on development set.
You can use other selection criteria such as watching loss value and specifying an epoch.
- Dev loss:
python3 pyner/named_entity/inference.py --metrics validation/main/loss model/conll2003.lample.skipngram.2019-09-24T07:02:33.536822
) - Specific epoch:
python3 pyner/named_entity/inference.py --epoch 1 model/conll2003.lample.skipngram.2019-09-24T07:02:33.536822
If you could generate a prediction file, it's time to evaluate a model performance.
conlleval is the standard script to evaluate CoNLL Chunking/NER tasks.
First of all, we have to download conlleval
.
Running the command make get-conlleval
would download conlleval
on current directory.
Then, evaluate!!!
$ ./conlleval < model/conll2003.lample.skipngram.2019-09-24T07:02:33.536822/validation.main.fscore.epoch_139.pred
processed 46435 tokens with 5628 phrases; found: 5651 phrases; correct: 5134.
accuracy: 97.82%; precision: 90.85%; recall: 91.22%; FB1: 91.04
LOC: precision: 93.41%; recall: 92.18%; FB1: 92.79 1640
MISC: precision: 80.66%; recall: 80.66%; FB1: 80.66 693
ORG: precision: 88.72%; recall: 89.79%; FB1: 89.26 1676
PER: precision: 94.76%; recall: 96.23%; FB1: 95.49 1642
F1 score on test set is 91.04, which is approximately the same as the result in Lample's paper! (90.94)
- Neural Architectures for Named Entity Recognition
- NAACL2016, Lample et al.