Tagger

Models used for Sequence Labeling

Requirements

allennlp == 0.7.0
python == 3.7.0
pytorch == 0.4.1

Usage

Commands

$ git clone https://github.com/zysite/tagger.git
$ cd tagger
# eg: BiLSTM+CHAR+CRF
$ python run.py --model=char_lstm_crf --task=ner

Arguments

$ python run.py -h
usage: run.py [-h] [--model {char_lstm_crf,elmo_lstm_crf}]
              [--task {chunking,ner,pos}] [--drop DROP]
              [--batch_size BATCH_SIZE] [--epochs EPOCHS]
              [--patience PATIENCE] [--lr LR] [--threads THREADS]
              [--seed SEED] [--device DEVICE] [--file FILE]

Create several models for Sequence Labeling.

optional arguments:
  -h, --help            show this help message and exit
  --model {char_lstm_crf,elmo_lstm_crf}
                        choose the model for Sequence Labeling
  --task {chunking,ner,pos}
                        choose the task of Sequence Labeling
  --drop DROP           set the prob of dropout
  --batch_size BATCH_SIZE
                        set the size of batch
  --epochs EPOCHS       set the max num of epochs
  --patience PATIENCE   set the num of epochs to be patient
  --lr LR               set the learning rate of training
  --threads THREADS, -t THREADS
                        set the max num of threads
  --seed SEED, -s SEED  set the seed for generating random numbers
  --device DEVICE, -d DEVICE
                        set which device to use
  --file FILE, -f FILE  set where to store the model

Structures

# CHAR+BiLSTM+CRF
CHAR_LSTM_CRF(
  (embed): Embedding(405440, 100)
  (char_lstm): CharLSTM(
    (embed): Embedding(517, 30)
    (lstm): LSTM(30, 150, batch_first=True, bidirectional=True)
  )
  (word_lstm): LSTM(400, 150, batch_first=True, bidirectional=True)
  (hid): Linear(in_features=300, out_features=150, bias=True)
  (activation): Tanh()
  (out): Linear(in_features=150, out_features=17, bias=True)
  (crf): CRF(n_tags=17)
  (drop): Dropout(p=0.5)
)
# ELMo+BiLSTM+CRF
ELMO_LSTM_CRF(
  (embed): Embedding(405440, 100)
  (scalar_mix): ScalarMix(n_reprs=3)
  (char_lstm): CharLSTM(
    (embed): Embedding(517, 30)
    (lstm): LSTM(30, 150, batch_first=True, bidirectional=True)
  )
  (word_lstm): LSTM(1424, 150, batch_first=True, bidirectional=True)
  (hid): Linear(in_features=300, out_features=150, bias=True)
  (activation): Tanh()
  (out): Linear(in_features=150, out_features=17, bias=True)
  (crf): CRF(n_tags=17)
  (drop): Dropout(p=0.5)
)

Results

NER

Pretrained: glove.6B.100d.txt.
Dataset: CoNLL-2003
- Train: 14987
- Dev: 3466
- Test: 3684

	Dev	Test	mT(s)
CHAR_LSTM_CRF	94.49%	90.72%	0:01:50.889580
ELMO_LSTM_CRF	95.64%	92.09%	0:01:46.960411

Chunking

Pretrained: glove.6B.100d.txt.
Dataset: CoNLL-2000
- Train: 7936
- Dev: 1000
- Test: 2012

	Dev	Test	mT(s)
CHAR_LSTM_CRF	95.02%	94.51%	0:01:21.141716
ELMO_LSTM_CRF	97.08%	96.34%	0:01:14.761098

POS-Tagging

Pretrained: glove.6B.100d.txt.
Dataset: WSJ
- Train: 38219
- Dev: 5527
- Test: 5462

	Dev	Test	mT(s)
CHAR_LSTM_CRF	97.68%	97.64%	0:05:59.462637
ELMO_LSTM_CRF	97.86%	97.81%	0:05:55.335100

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
results		results
tagger		tagger
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tagger

Requirements

Usage

Commands

Arguments

Structures

Results

NER

Chunking

POS-Tagging

References

About

Releases

Packages

Languages

License

yzhangcs/tagger

Folders and files

Latest commit

History

Repository files navigation

Tagger

Requirements

Usage

Commands

Arguments

Structures

Results

NER

Chunking

POS-Tagging

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages