Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13

Closed
ayrtondenner opened this issue Apr 30, 2018 · 20 comments

Comments

@ayrtondenner
Copy link

Hello

I am trying to achieve the same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper, but it doesn't seem to match the results that the paper claims to have after 50 epochs. I've also read #8 issue.

Because I'm using Windows, I got the hyper-parameters off the .sh script and wrote them direct into the NERCRF.py code.

image

After 50 epochs, using the GloVe embeddings with 100 dimensions and CoNLL-2003 corpus (which I downloaded from this repository), I've only managed a 84.76% F1 score in my dev data and a 80.32% F1 score in my test data. Are the hyper-parameters rights? Did you use eng.testa for dev data and eng.testb for test data, or did you used different files? Should I pay attention to anything else?

Thanks.

@XuezheMax
Copy link
Owner

XuezheMax commented Apr 30, 2018 via email

@ayrtondenner
Copy link
Author

Hello. I'm actually using 0.3.1.post2 of PyTorch. Should I update it to 0.4? Could a different version produce a different performance outcome as well? Seems weird...

@XuezheMax
Copy link
Owner

XuezheMax commented Apr 30, 2018 via email

@pvcastro
Copy link

pvcastro commented Jun 6, 2018

Hi @XuezheMax, I'm also running the run_ner_crf script and I'm having problems getting to the results reported in your paper. I'm getting results similar to the ones @ayrtondenner got.
I'm using your pytorch0.4 branch with the following settings:

  • Anaconda 4.5.1 with python 3.6.3
  • pytorch 0.4.0
  • gensim 3.4.0
  • glove embeddings glove.6B.100d.gz
  • train, test and dev data are the ones I got from https://github.com/glample/tagger/tree/master/dataset. I adapted your code in my fork to disregard those starting numbers at each line. Should this make any difference?
  • The rest of the hyperparameters are the default ones that are set in the examples/run_ner_crf.sh script.

What could be wrong?

Thanks!

@XuezheMax
Copy link
Owner

Hi,
I am not sure what is the problem. One possible reason might be the tagging schema (BIO). If you are using the original data from conll 03, you need to convert it to the standard bio schema or the more advanced bioes (marginal improvement)

@pvcastro
Copy link

pvcastro commented Jun 6, 2018

I see, I noticed that the annotation scheme is really messed up. The LSTM-CRF from Lample fixes this in memory, but the training file is the same, that's why it doesn't matter for his code.
Do you know where I could get this conll 2003 corpus annotated the proper way? In either BIO or BIOES scheme.

@XuezheMax
Copy link
Owner

XuezheMax commented Jun 6, 2018

Here is the code I used to convert it to BIO

def transform(ifile, ofile):
	with open(ifile, 'r') as reader, open(ofile, 'w') as writer:
		prev = 'O'
		for line in reader:
			line = line.strip()
			if len(line) == 0:
				prev = 'O'
				writer.write('\n')
				continue

			tokens = line.split()
			# print tokens
			label = tokens[-1]
			if label != 'O' and label != prev:
				if prev == 'O':
					label = 'B-' + label[2:]
				elif label[2:] != prev[2:]:
					label = 'B-' + label[2:]
				else:
					label = label
			writer.write(" ".join(tokens[:-1]) + " " + label)
			writer.write('\n')
			prev = tokens[-1]

@pvcastro
Copy link

pvcastro commented Jun 6, 2018

Great, thanks @XuezheMax !

@pvcastro
Copy link

pvcastro commented Jun 6, 2018

Strangely, it doesn't seem to have made any difference 🤔
I don't suppose using those starting numbers is relevant to determine where each document or sentence finishes, is it?
Can you confirm that the exact parameters in run_ner_crf.sh should be enough to reach a 90% F1 score on the test set? Some of them are different from what you report on your paper, but maybe the difference doesn't matter.

@XuezheMax
Copy link
Owner

Yes, I am sure that using the exact parameters in run_ner_crf.sh should give around 91% F1 score on test set.

@XuezheMax
Copy link
Owner

Would you please paste your log here so I can check the possible issues.
Again, make sure to remove the alphabets folder in data/ to create new vocabulary files.

@pvcastro
Copy link

pvcastro commented Jun 6, 2018

Yes, I did remove the alphabets folder 👍

I'm running a new training now with the latest adjustments. Fixed another place in the code that was referring to the word token with the wrong index (after removing the starting numbers). Here's the log so far:

/home/pedro/virtualenv/pytorch/bin/python /home/pedro/pycharm-community-2017.3.2/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 37531 --file /home/pedro/repositorios/NeuroNLP2/examples/NERCRF.py --cuda --mode LSTM --num_epochs 200 --batch_size 16 --hidden_size 256 --num_layers 1 --char_dim 30 --num_filters 30 --tag_space 128 --learning_rate 0.01 --decay_rate 0.05 --schedule 1 --gamma 0.0 --dropout std --p_in 0.33 --p_rnn 0.33 0.5 --p_out 0.5 --unk_replace 0.0 --bigram --embedding glove --embedding_dict /media/discoD/embeddings/English/Glove/glove.6B/glove.6B.100d.gz --train data/conll2003/english/eng.train.bios --dev data/conll2003/english/eng.testa.bios --test data/conll2003/english/eng.testb.bios
Connected to pydev debugger (build 181.4203.547)
pydev debugger: process 4141 is connecting

loading embedding: glove from /media/discoD/embeddings/English/Glove/glove.6B/glove.6B.100d.gz
2018-06-06 15:49:10,504 - NERCRF - INFO - Creating Alphabets
2018-06-06 15:49:10,504 - Create Alphabets - INFO - Creating Alphabets: data/alphabets/ner_crf/
2018-06-06 15:49:11,628 - Create Alphabets - INFO - Total Vocabulary Size: 20102
2018-06-06 15:49:11,628 - Create Alphabets - INFO - Total Singleton Size: 9178
2018-06-06 15:49:11,630 - Create Alphabets - INFO - Total Vocabulary Size (w.o rare words): 19046
2018-06-06 15:49:12,295 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2018-06-06 15:49:12,296 - Create Alphabets - INFO - Character Alphabet Size: 86
2018-06-06 15:49:12,296 - Create Alphabets - INFO - POS Alphabet Size: 47
2018-06-06 15:49:12,296 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2018-06-06 15:49:12,296 - Create Alphabets - INFO - NER Alphabet Size: 10
2018-06-06 15:49:12,296 - NERCRF - INFO - Word Alphabet Size: 23598
2018-06-06 15:49:12,296 - NERCRF - INFO - Character Alphabet Size: 86
2018-06-06 15:49:12,296 - NERCRF - INFO - POS Alphabet Size: 47
2018-06-06 15:49:12,296 - NERCRF - INFO - Chunk Alphabet Size: 19
2018-06-06 15:49:12,296 - NERCRF - INFO - NER Alphabet Size: 10
2018-06-06 15:49:12,296 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bios
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.testa.bios
Total number of data: 3466
Reading data from data/conll2003/english/eng.testb.bios
Total number of data: 3684
oov: 339
2018-06-06 15:53:01,370 - NERCRF - INFO - constructing network...
/home/pedro/virtualenv/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
2018-06-06 15:53:01,387 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2018-06-06 15:53:01,387 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, unk replace: 0.00)
2018-06-06 15:53:01,387 - NERCRF - INFO - dropout(in, out, rnn): (0.33, 0.50, (0.33, 0.5))
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 11.1595, time left (estimated): 15.22s
train: 200/937 loss: 7.2109, time left (estimated): 12.09s
train: 300/937 loss: 5.8057, time left (estimated): 10.10s
train: 400/937 loss: 5.0669, time left (estimated): 8.42s
train: 500/937 loss: 4.5988, time left (estimated): 6.86s
train: 600/937 loss: 4.2958, time left (estimated): 5.30s
train: 700/937 loss: 4.0640, time left (estimated): 3.72s
train: 800/937 loss: 3.8781, time left (estimated): 2.16s
train: 900/937 loss: 3.7093, time left (estimated): 0.59s
train: 937 loss: 3.6504, time: 14.58s
dev acc: 97.02%, precision: 79.24%, recall: 75.75%, F1: 77.45%
best dev acc: 97.02%, precision: 79.24%, recall: 75.75%, F1: 77.45% (epoch: 1)
best test acc: 96.35%, precision: 74.47%, recall: 71.87%, F1: 73.15% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 2.3227, time left (estimated): 12.82s
train: 200/937 loss: 2.4067, time left (estimated): 11.64s
train: 300/937 loss: 2.4593, time left (estimated): 10.47s
train: 400/937 loss: 2.4737, time left (estimated): 8.83s
train: 500/937 loss: 2.4559, time left (estimated): 7.14s
train: 600/937 loss: 2.4435, time left (estimated): 5.52s
train: 700/937 loss: 2.4438, time left (estimated): 3.89s
train: 800/937 loss: 2.4204, time left (estimated): 2.26s
train: 900/937 loss: 2.3705, time left (estimated): 0.61s
train: 937 loss: 2.3726, time: 15.26s
dev acc: 97.55%, precision: 80.98%, recall: 79.55%, F1: 80.26%
best dev acc: 97.55%, precision: 80.98%, recall: 79.55%, F1: 80.26% (epoch: 2)
best test acc: 96.58%, precision: 75.61%, recall: 74.53%, F1: 75.07% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 2.1304, time left (estimated): 13.11s
train: 200/937 loss: 2.1364, time left (estimated): 11.71s
train: 300/937 loss: 2.2066, time left (estimated): 10.44s
train: 400/937 loss: 2.1977, time left (estimated): 8.77s
train: 500/937 loss: 2.1580, time left (estimated): 7.15s
train: 600/937 loss: 2.1675, time left (estimated): 5.62s
train: 700/937 loss: 2.1589, time left (estimated): 3.94s
train: 800/937 loss: 2.1703, time left (estimated): 2.29s
train: 900/937 loss: 2.1547, time left (estimated): 0.62s
train: 937 loss: 2.1668, time: 15.58s
dev acc: 97.69%, precision: 81.49%, recall: 79.97%, F1: 80.72%
best dev acc: 97.69%, precision: 81.49%, recall: 79.97%, F1: 80.72% (epoch: 3)
best test acc: 96.99%, precision: 77.07%, recall: 75.80%, F1: 76.43% (epoch: 3)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.8794, time left (estimated): 12.88s
train: 200/937 loss: 1.9610, time left (estimated): 11.79s
train: 300/937 loss: 1.9138, time left (estimated): 9.95s
train: 400/937 loss: 1.8985, time left (estimated): 8.52s
train: 500/937 loss: 1.9170, time left (estimated): 7.04s
train: 600/937 loss: 1.8895, time left (estimated): 5.45s
train: 700/937 loss: 1.8744, time left (estimated): 3.83s
train: 800/937 loss: 1.8929, time left (estimated): 2.23s
train: 900/937 loss: 1.8825, time left (estimated): 0.61s
train: 937 loss: 1.8929, time: 15.16s
dev acc: 98.00%, precision: 82.79%, recall: 81.04%, F1: 81.91%
best dev acc: 98.00%, precision: 82.79%, recall: 81.04%, F1: 81.91% (epoch: 4)
best test acc: 97.13%, precision: 77.70%, recall: 76.02%, F1: 76.85% (epoch: 4)
Epoch 5 (LSTM(std), learning rate=0.0083, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.6122, time left (estimated): 12.56s
train: 200/937 loss: 1.7545, time left (estimated): 11.42s
train: 300/937 loss: 1.8272, time left (estimated): 10.19s
train: 400/937 loss: 1.8695, time left (estimated): 8.71s
train: 500/937 loss: 1.8206, time left (estimated): 6.98s
train: 600/937 loss: 1.8122, time left (estimated): 5.43s
train: 700/937 loss: 1.7974, time left (estimated): 3.80s
train: 800/937 loss: 1.7895, time left (estimated): 2.21s
train: 900/937 loss: 1.7844, time left (estimated): 0.60s
train: 937 loss: 1.7592, time: 14.92s
dev acc: 98.03%, precision: 82.51%, recall: 82.19%, F1: 82.35%
best dev acc: 98.03%, precision: 82.51%, recall: 82.19%, F1: 82.35% (epoch: 5)
best test acc: 97.14%, precision: 77.33%, recall: 77.21%, F1: 77.27% (epoch: 5)
Epoch 6 (LSTM(std), learning rate=0.0080, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.7967, time left (estimated): 13.76s
train: 200/937 loss: 1.7380, time left (estimated): 12.11s
train: 300/937 loss: 1.7062, time left (estimated): 10.29s
train: 400/937 loss: 1.7048, time left (estimated): 8.64s
train: 500/937 loss: 1.7066, time left (estimated): 7.05s
train: 600/937 loss: 1.7288, time left (estimated): 5.49s
train: 700/937 loss: 1.7400, time left (estimated): 3.88s
train: 800/937 loss: 1.7497, time left (estimated): 2.23s
train: 900/937 loss: 1.7627, time left (estimated): 0.61s
train: 937 loss: 1.7641, time: 15.22s
dev acc: 98.04%, precision: 82.49%, recall: 82.88%, F1: 82.69%
best dev acc: 98.04%, precision: 82.49%, recall: 82.88%, F1: 82.69% (epoch: 6)
best test acc: 97.07%, precision: 76.98%, recall: 78.21%, F1: 77.59% (epoch: 6)
Epoch 7 (LSTM(std), learning rate=0.0077, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.6099, time left (estimated): 12.96s
train: 200/937 loss: 1.7350, time left (estimated): 11.93s
train: 300/937 loss: 1.7129, time left (estimated): 10.28s
train: 400/937 loss: 1.7469, time left (estimated): 8.82s
train: 500/937 loss: 1.7572, time left (estimated): 7.17s
train: 600/937 loss: 1.7370, time left (estimated): 5.55s
train: 700/937 loss: 1.7093, time left (estimated): 3.89s
train: 800/937 loss: 1.6880, time left (estimated): 2.23s
train: 900/937 loss: 1.6875, time left (estimated): 0.61s
train: 937 loss: 1.6810, time: 15.08s
dev acc: 98.21%, precision: 83.37%, recall: 82.10%, F1: 82.73%
best dev acc: 98.21%, precision: 83.37%, recall: 82.10%, F1: 82.73% (epoch: 7)
best test acc: 97.24%, precision: 78.31%, recall: 77.17%, F1: 77.74% (epoch: 7)
Epoch 8 (LSTM(std), learning rate=0.0074, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.4601, time left (estimated): 12.74s
train: 200/937 loss: 1.7144, time left (estimated): 12.05s
train: 300/937 loss: 1.6738, time left (estimated): 10.41s
train: 400/937 loss: 1.6353, time left (estimated): 8.70s
train: 500/937 loss: 1.6488, time left (estimated): 7.16s
train: 600/937 loss: 1.6255, time left (estimated): 5.44s
train: 700/937 loss: 1.6026, time left (estimated): 3.82s
train: 800/937 loss: 1.5943, time left (estimated): 2.20s
train: 900/937 loss: 1.5904, time left (estimated): 0.60s
train: 937 loss: 1.5851, time: 15.00s
dev acc: 98.16%, precision: 83.43%, recall: 81.04%, F1: 82.22%
best dev acc: 98.21%, precision: 83.37%, recall: 82.10%, F1: 82.73% (epoch: 7)
best test acc: 97.24%, precision: 78.31%, recall: 77.17%, F1: 77.74% (epoch: 7)
Epoch 9 (LSTM(std), learning rate=0.0071, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5425, time left (estimated): 12.59s
train: 200/937 loss: 1.6459, time left (estimated): 11.54s
train: 300/937 loss: 1.6891, time left (estimated): 10.43s
train: 400/937 loss: 1.6785, time left (estimated): 8.78s
train: 500/937 loss: 1.6821, time left (estimated): 7.16s
train: 600/937 loss: 1.6776, time left (estimated): 5.53s
train: 700/937 loss: 1.6908, time left (estimated): 3.96s
train: 800/937 loss: 1.6926, time left (estimated): 2.29s
train: 900/937 loss: 1.6696, time left (estimated): 0.62s
train: 937 loss: 1.6775, time: 15.54s
dev acc: 98.28%, precision: 83.63%, recall: 82.79%, F1: 83.21%
best dev acc: 98.28%, precision: 83.63%, recall: 82.79%, F1: 83.21% (epoch: 9)
best test acc: 97.40%, precision: 78.73%, recall: 78.42%, F1: 78.58% (epoch: 9)
Epoch 10 (LSTM(std), learning rate=0.0069, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5739, time left (estimated): 14.88s
train: 200/937 loss: 1.4929, time left (estimated): 12.41s
train: 300/937 loss: 1.4723, time left (estimated): 10.52s
train: 400/937 loss: 1.5359, time left (estimated): 8.98s
train: 500/937 loss: 1.4927, time left (estimated): 7.15s
train: 600/937 loss: 1.4833, time left (estimated): 5.50s
train: 700/937 loss: 1.4559, time left (estimated): 3.83s
train: 800/937 loss: 1.4410, time left (estimated): 2.18s
train: 900/937 loss: 1.4595, time left (estimated): 0.60s
train: 937 loss: 1.4702, time: 15.02s
dev acc: 98.34%, precision: 83.74%, recall: 83.01%, F1: 83.37%
best dev acc: 98.34%, precision: 83.74%, recall: 83.01%, F1: 83.37% (epoch: 10)
best test acc: 97.48%, precision: 78.92%, recall: 78.56%, F1: 78.74% (epoch: 10)
Epoch 11 (LSTM(std), learning rate=0.0067, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5810, time left (estimated): 14.03s
train: 200/937 loss: 1.5853, time left (estimated): 12.40s
train: 300/937 loss: 1.5423, time left (estimated): 10.69s
train: 400/937 loss: 1.5091, time left (estimated): 8.81s
train: 500/937 loss: 1.4996, time left (estimated): 7.09s
train: 600/937 loss: 1.4911, time left (estimated): 5.46s
train: 700/937 loss: 1.4757, time left (estimated): 3.83s
train: 800/937 loss: 1.4645, time left (estimated): 2.21s
train: 900/937 loss: 1.4694, time left (estimated): 0.61s
train: 937 loss: 1.4674, time: 15.13s
dev acc: 98.36%, precision: 83.55%, recall: 83.36%, F1: 83.46%
best dev acc: 98.36%, precision: 83.55%, recall: 83.36%, F1: 83.46% (epoch: 11)
best test acc: 97.57%, precision: 78.82%, recall: 79.07%, F1: 78.94% (epoch: 11)
Epoch 12 (LSTM(std), learning rate=0.0065, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.1637, time left (estimated): 12.80s
train: 200/937 loss: 1.2805, time left (estimated): 11.64s
train: 300/937 loss: 1.3509, time left (estimated): 10.32s
train: 400/937 loss: 1.3464, time left (estimated): 8.69s
train: 500/937 loss: 1.3561, time left (estimated): 7.01s
train: 600/937 loss: 1.3453, time left (estimated): 5.38s
train: 700/937 loss: 1.3587, time left (estimated): 3.78s
train: 800/937 loss: 1.3513, time left (estimated): 2.19s
train: 900/937 loss: 1.3726, time left (estimated): 0.61s
train: 937 loss: 1.3741, time: 15.10s
dev acc: 98.16%, precision: 83.13%, recall: 83.11%, F1: 83.12%
best dev acc: 98.36%, precision: 83.55%, recall: 83.36%, F1: 83.46% (epoch: 11)
best test acc: 97.57%, precision: 78.82%, recall: 79.07%, F1: 78.94% (epoch: 11)
Epoch 13 (LSTM(std), learning rate=0.0062, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5685, time left (estimated): 15.13s
train: 200/937 loss: 1.5330, time left (estimated): 13.42s
train: 300/937 loss: 1.5295, time left (estimated): 11.47s
train: 400/937 loss: 1.4667, time left (estimated): 9.38s
train: 500/937 loss: 1.5124, time left (estimated): 7.85s
train: 600/937 loss: 1.5023, time left (estimated): 6.03s
train: 700/937 loss: 1.4821, time left (estimated): 4.17s
train: 800/937 loss: 1.4831, time left (estimated): 2.41s
train: 900/937 loss: 1.4986, time left (estimated): 0.66s
train: 937 loss: 1.4936, time: 16.46s
dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 14 (LSTM(std), learning rate=0.0061, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.2822, time left (estimated): 12.56s
train: 200/937 loss: 1.3552, time left (estimated): 11.52s
train: 300/937 loss: 1.3195, time left (estimated): 9.87s
train: 400/937 loss: 1.3449, time left (estimated): 8.48s
train: 500/937 loss: 1.3591, time left (estimated): 6.98s
train: 600/937 loss: 1.3216, time left (estimated): 5.32s
train: 700/937 loss: 1.3230, time left (estimated): 3.79s
train: 800/937 loss: 1.3476, time left (estimated): 2.21s
train: 900/937 loss: 1.3365, time left (estimated): 0.60s
train: 937 loss: 1.3412, time: 14.99s
dev acc: 98.42%, precision: 83.90%, recall: 83.42%, F1: 83.66%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 15 (LSTM(std), learning rate=0.0059, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5639, time left (estimated): 14.34s
train: 200/937 loss: 1.5256, time left (estimated): 12.75s
train: 300/937 loss: 1.5398, time left (estimated): 11.06s
train: 400/937 loss: 1.5272, time left (estimated): 9.35s
train: 500/937 loss: 1.5028, time left (estimated): 7.52s
train: 600/937 loss: 1.4775, time left (estimated): 5.78s
train: 700/937 loss: 1.4980, time left (estimated): 4.12s
train: 800/937 loss: 1.4719, time left (estimated): 2.37s
train: 900/937 loss: 1.4516, time left (estimated): 0.64s
train: 937 loss: 1.4439, time: 15.85s
dev acc: 98.45%, precision: 83.76%, recall: 82.96%, F1: 83.36%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 16 (LSTM(std), learning rate=0.0057, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.0337, time left (estimated): 11.95s
train: 200/937 loss: 1.2146, time left (estimated): 11.50s
train: 300/937 loss: 1.2163, time left (estimated): 10.00s
train: 400/937 loss: 1.2734, time left (estimated): 8.66s
train: 500/937 loss: 1.3102, time left (estimated): 7.14s
train: 600/937 loss: 1.3274, time left (estimated): 5.56s
train: 700/937 loss: 1.3259, time left (estimated): 3.90s
train: 800/937 loss: 1.3224, time left (estimated): 2.24s
train: 900/937 loss: 1.3096, time left (estimated): 0.61s
train: 937 loss: 1.3034, time: 15.19s
dev acc: 98.43%, precision: 83.86%, recall: 83.54%, F1: 83.70%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 17 (LSTM(std), learning rate=0.0056, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5186, time left (estimated): 13.96s
train: 200/937 loss: 1.4127, time left (estimated): 11.87s
train: 300/937 loss: 1.3337, time left (estimated): 9.97s
train: 400/937 loss: 1.3327, time left (estimated): 8.48s
train: 500/937 loss: 1.3473, time left (estimated): 6.99s
train: 600/937 loss: 1.3244, time left (estimated): 5.42s
train: 700/937 loss: 1.3301, time left (estimated): 3.82s
train: 800/937 loss: 1.3322, time left (estimated): 2.22s
train: 900/937 loss: 1.3217, time left (estimated): 0.61s
train: 937 loss: 1.3175, time: 15.08s
dev acc: 98.46%, precision: 84.10%, recall: 83.46%, F1: 83.78%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 18 (LSTM(std), learning rate=0.0054, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.2661, time left (estimated): 13.31s

@XuezheMax
Copy link
Owner

Here is my log. You are using python 3.6, right? what is your pytorch version?
Could you trying to using python 2.7 with pytorch 0.3.1 to re-run your experiments to see if it is the issue of the versions.
loading embedding: glove from data/glove/glove.6B/glove.6B.100d.gz
2018-06-06 15:44:55,126 - NERCRF - INFO - Creating Alphabets
2018-06-06 15:44:55,126 - Create Alphabets - INFO - Creating Alphabets: data/alphabets/ner_crf/
2018-06-06 15:44:56,115 - Create Alphabets - INFO - Total Vocabulary Size: 20102
2018-06-06 15:44:56,116 - Create Alphabets - INFO - Total Singleton Size: 9178
2018-06-06 15:44:56,120 - Create Alphabets - INFO - Total Vocabulary Size (w.o rare words): 19046
2018-06-06 15:44:56,499 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2018-06-06 15:44:56,499 - Create Alphabets - INFO - Character Alphabet Size: 86
2018-06-06 15:44:56,499 - Create Alphabets - INFO - POS Alphabet Size: 47
2018-06-06 15:44:56,499 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2018-06-06 15:44:56,499 - Create Alphabets - INFO - NER Alphabet Size: 10
2018-06-06 15:44:56,499 - NERCRF - INFO - Word Alphabet Size: 23598
2018-06-06 15:44:56,500 - NERCRF - INFO - Character Alphabet Size: 86
2018-06-06 15:44:56,500 - NERCRF - INFO - POS Alphabet Size: 47
2018-06-06 15:44:56,500 - NERCRF - INFO - Chunk Alphabet Size: 19
2018-06-06 15:44:56,500 - NERCRF - INFO - NER Alphabet Size: 10
2018-06-06 15:44:56,500 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bio.conll
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.dev.bio.conll
Total number of data: 3466
Reading data from data/conll2003/english/eng.test.bio.conll
Total number of data: 3684
oov: 339
2018-06-06 15:45:01,810 - NERCRF - INFO - constructing network...
2018-06-06 15:45:02,979 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2018-06-06 15:45:02,980 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, unk replace: 0.00)
2018-06-06 15:45:02,980 - NERCRF - INFO - dropout(in, out, rnn): (0.33, 0.50, (0.33, 0.5))
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 937 loss: 3.6320, time: 23.30s
dev acc: 96.81%, precision: 86.45%, recall: 83.52%, F1: 84.96%
best dev acc: 96.81%, precision: 86.45%, recall: 83.52%, F1: 84.96% (epoch: 1)
best test acc: 95.90%, precision: 81.77%, recall: 80.05%, F1: 80.90% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.3164, time: 19.93s
dev acc: 97.53%, precision: 89.47%, recall: 87.39%, F1: 88.42%
best dev acc: 97.53%, precision: 89.47%, recall: 87.39%, F1: 88.42% (epoch: 2)
best test acc: 96.79%, precision: 85.61%, recall: 84.37%, F1: 84.98% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.0166, time: 20.60s
dev acc: 97.56%, precision: 89.06%, recall: 87.24%, F1: 88.14%
best dev acc: 97.53%, precision: 89.47%, recall: 87.39%, F1: 88.42% (epoch: 2)
best test acc: 96.79%, precision: 85.61%, recall: 84.37%, F1: 84.98% (epoch: 2)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.9072, time: 21.19s
dev acc: 97.81%, precision: 91.33%, recall: 88.66%, F1: 89.97%
best dev acc: 97.81%, precision: 91.33%, recall: 88.66%, F1: 89.97% (epoch: 4)
best test acc: 97.20%, precision: 88.10%, recall: 85.98%, F1: 87.03% (epoch: 4)
Epoch 5 (LSTM(std), learning rate=0.0083, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8425, time: 20.10s
dev acc: 98.05%, precision: 92.23%, recall: 90.04%, F1: 91.12%
best dev acc: 98.05%, precision: 92.23%, recall: 90.04%, F1: 91.12% (epoch: 5)
best test acc: 97.27%, precision: 88.23%, recall: 86.63%, F1: 87.42% (epoch: 5)
Epoch 6 (LSTM(std), learning rate=0.0080, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7096, time: 20.70s
dev acc: 97.79%, precision: 92.15%, recall: 89.13%, F1: 90.62%
best dev acc: 98.05%, precision: 92.23%, recall: 90.04%, F1: 91.12% (epoch: 5)
best test acc: 97.27%, precision: 88.23%, recall: 86.63%, F1: 87.42% (epoch: 5)
Epoch 7 (LSTM(std), learning rate=0.0077, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7420, time: 22.69s
dev acc: 98.16%, precision: 91.95%, recall: 90.91%, F1: 91.43%
best dev acc: 98.16%, precision: 91.95%, recall: 90.91%, F1: 91.43% (epoch: 7)
best test acc: 97.38%, precision: 88.18%, recall: 87.82%, F1: 88.00% (epoch: 7)

@pvcastro
Copy link

pvcastro commented Jun 6, 2018

Yes, I'm running Anaconda 4.5.1 with python 3.6.3, pytorch 0.4.0 (using your pytorch0.4 branch) and gensim 3.4.0.
I'll set up the python 2 environment and will verify the results.

@XuezheMax
Copy link
Owner

FYI. here is the first 35 epochs for python 2.7 with pytorch 0.4. I seems it converges slower than pytorch 0.3. But still approaches 90% F1 after 35 epochs.
loading embedding: glove from data/glove/glove.6B/glove.6B.100d.gz
2018-06-06 16:25:56,009 - NERCRF - INFO - Creating Alphabets
2018-06-06 16:25:56,057 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2018-06-06 16:25:56,058 - Create Alphabets - INFO - Character Alphabet Size: 86
2018-06-06 16:25:56,058 - Create Alphabets - INFO - POS Alphabet Size: 47
2018-06-06 16:25:56,058 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2018-06-06 16:25:56,058 - Create Alphabets - INFO - NER Alphabet Size: 18
2018-06-06 16:25:56,058 - NERCRF - INFO - Word Alphabet Size: 23598
2018-06-06 16:25:56,058 - NERCRF - INFO - Character Alphabet Size: 86
2018-06-06 16:25:56,058 - NERCRF - INFO - POS Alphabet Size: 47
2018-06-06 16:25:56,058 - NERCRF - INFO - Chunk Alphabet Size: 19
2018-06-06 16:25:56,058 - NERCRF - INFO - NER Alphabet Size: 18
2018-06-06 16:25:56,058 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bioes.conll
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.dev.bioes.conll
Total number of data: 3466
Reading data from data/conll2003/english/eng.test.bioes.conll
Total number of data: 3684
oov: 339
2018-06-06 16:25:59,294 - NERCRF - INFO - constructing network...
/home/max/.local/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
2018-06-06 16:25:59,314 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2018-06-06 16:25:59,315 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, unk replace: 0.00)
2018-06-06 16:25:59,315 - NERCRF - INFO - dropout(in, out, rnn): (0.33, 0.50, (0.33, 0.5))
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 937 loss: 11.5858, time: 116.24s
dev acc: 94.64%, precision: 77.99%, recall: 71.49%, F1: 74.60%
best dev acc: 94.64%, precision: 77.99%, recall: 71.49%, F1: 74.60% (epoch: 1)
best test acc: 93.82%, precision: 76.13%, recall: 70.41%, F1: 73.16% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 937 loss: 3.1999, time: 125.24s
dev acc: 96.54%, precision: 85.75%, recall: 83.12%, F1: 84.41%
best dev acc: 96.54%, precision: 85.75%, recall: 83.12%, F1: 84.41% (epoch: 2)
best test acc: 95.70%, precision: 81.84%, recall: 79.64%, F1: 80.73% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.6765, time: 114.69s
dev acc: 96.89%, precision: 90.07%, recall: 84.40%, F1: 87.14%
best dev acc: 96.89%, precision: 90.07%, recall: 84.40%, F1: 87.14% (epoch: 3)
best test acc: 95.90%, precision: 85.93%, recall: 80.35%, F1: 83.05% (epoch: 3)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.3663, time: 107.77s
dev acc: 97.26%, precision: 89.77%, recall: 85.85%, F1: 87.77%
best dev acc: 97.26%, precision: 89.77%, recall: 85.85%, F1: 87.77% (epoch: 4)
best test acc: 96.40%, precision: 85.72%, recall: 81.82%, F1: 83.72% (epoch: 4)
Epoch 5 (LSTM(std), learning rate=0.0083, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.2414, time: 112.05s
dev acc: 97.48%, precision: 88.71%, recall: 88.37%, F1: 88.54%
best dev acc: 97.48%, precision: 88.71%, recall: 88.37%, F1: 88.54% (epoch: 5)
best test acc: 96.54%, precision: 84.67%, recall: 84.95%, F1: 84.81% (epoch: 5)
Epoch 6 (LSTM(std), learning rate=0.0080, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.1981, time: 112.35s
dev acc: 97.58%, precision: 90.12%, recall: 89.04%, F1: 89.58%
best dev acc: 97.58%, precision: 90.12%, recall: 89.04%, F1: 89.58% (epoch: 6)
best test acc: 96.85%, precision: 87.28%, recall: 85.98%, F1: 86.62% (epoch: 6)
Epoch 7 (LSTM(std), learning rate=0.0077, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.0362, time: 114.91s
dev acc: 97.70%, precision: 92.14%, recall: 88.61%, F1: 90.34%
best dev acc: 97.70%, precision: 92.14%, recall: 88.61%, F1: 90.34% (epoch: 7)
best test acc: 96.89%, precision: 88.24%, recall: 84.24%, F1: 86.20% (epoch: 7)
Epoch 8 (LSTM(std), learning rate=0.0074, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8955, time: 111.44s
dev acc: 97.35%, precision: 89.69%, recall: 87.53%, F1: 88.60%
best dev acc: 97.70%, precision: 92.14%, recall: 88.61%, F1: 90.34% (epoch: 7)
best test acc: 96.89%, precision: 88.24%, recall: 84.24%, F1: 86.20% (epoch: 7)
Epoch 9 (LSTM(std), learning rate=0.0071, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.9163, time: 106.08s
dev acc: 97.94%, precision: 91.67%, recall: 90.17%, F1: 90.91%
best dev acc: 97.94%, precision: 91.67%, recall: 90.17%, F1: 90.91% (epoch: 9)
best test acc: 97.14%, precision: 88.07%, recall: 86.88%, F1: 87.47% (epoch: 9)
Epoch 10 (LSTM(std), learning rate=0.0069, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8767, time: 110.97s
dev acc: 97.96%, precision: 92.15%, recall: 90.07%, F1: 91.10%
best dev acc: 97.96%, precision: 92.15%, recall: 90.07%, F1: 91.10% (epoch: 10)
best test acc: 97.07%, precision: 87.82%, recall: 86.07%, F1: 86.94% (epoch: 10)
Epoch 11 (LSTM(std), learning rate=0.0067, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8514, time: 113.16s
dev acc: 97.82%, precision: 91.57%, recall: 90.27%, F1: 90.92%
best dev acc: 97.96%, precision: 92.15%, recall: 90.07%, F1: 91.10% (epoch: 10)
best test acc: 97.07%, precision: 87.82%, recall: 86.07%, F1: 86.94% (epoch: 10)
Epoch 12 (LSTM(std), learning rate=0.0065, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7597, time: 108.15s
dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 13 (LSTM(std), learning rate=0.0062, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7508, time: 111.77s
dev acc: 98.06%, precision: 92.18%, recall: 90.71%, F1: 91.44%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 14 (LSTM(std), learning rate=0.0061, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7144, time: 107.66s
dev acc: 98.05%, precision: 92.76%, recall: 90.61%, F1: 91.67%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 15 (LSTM(std), learning rate=0.0059, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.6631, time: 113.54s
dev acc: 98.13%, precision: 92.51%, recall: 91.01%, F1: 91.75%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 16 (LSTM(std), learning rate=0.0057, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.6694, time: 115.85s
dev acc: 98.08%, precision: 92.43%, recall: 90.83%, F1: 91.62%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 17 (LSTM(std), learning rate=0.0056, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.6892, time: 115.00s
dev acc: 98.20%, precision: 92.69%, recall: 91.27%, F1: 91.97%
best dev acc: 98.20%, precision: 92.69%, recall: 91.27%, F1: 91.97% (epoch: 17)
best test acc: 97.30%, precision: 89.00%, recall: 87.64%, F1: 88.31% (epoch: 17)
Epoch 18 (LSTM(std), learning rate=0.0054, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5907, time: 108.94s
dev acc: 98.17%, precision: 93.07%, recall: 91.59%, F1: 92.32%
best dev acc: 98.17%, precision: 93.07%, recall: 91.59%, F1: 92.32% (epoch: 18)
best test acc: 97.39%, precision: 89.51%, recall: 88.21%, F1: 88.85% (epoch: 18)
Epoch 19 (LSTM(std), learning rate=0.0053, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5726, time: 110.24s
dev acc: 98.24%, precision: 93.42%, recall: 91.47%, F1: 92.43%
best dev acc: 98.24%, precision: 93.42%, recall: 91.47%, F1: 92.43% (epoch: 19)
best test acc: 97.42%, precision: 89.85%, recall: 87.91%, F1: 88.87% (epoch: 19)
Epoch 20 (LSTM(std), learning rate=0.0051, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5618, time: 110.93s
dev acc: 98.08%, precision: 92.10%, recall: 90.98%, F1: 91.53%
best dev acc: 98.24%, precision: 93.42%, recall: 91.47%, F1: 92.43% (epoch: 19)
best test acc: 97.42%, precision: 89.85%, recall: 87.91%, F1: 88.87% (epoch: 19)
Epoch 21 (LSTM(std), learning rate=0.0050, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5315, time: 114.51s
dev acc: 98.24%, precision: 93.34%, recall: 91.55%, F1: 92.44%
best dev acc: 98.24%, precision: 93.34%, recall: 91.55%, F1: 92.44% (epoch: 21)
best test acc: 97.39%, precision: 89.59%, recall: 87.73%, F1: 88.65% (epoch: 21)
Epoch 22 (LSTM(std), learning rate=0.0049, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5707, time: 111.92s
dev acc: 98.34%, precision: 93.33%, recall: 92.34%, F1: 92.83%
best dev acc: 98.34%, precision: 93.33%, recall: 92.34%, F1: 92.83% (epoch: 22)
best test acc: 97.40%, precision: 89.47%, recall: 88.49%, F1: 88.98% (epoch: 22)
Epoch 23 (LSTM(std), learning rate=0.0048, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5023, time: 109.71s
dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88%
best dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88% (epoch: 23)
best test acc: 97.45%, precision: 89.12%, recall: 88.79%, F1: 88.96% (epoch: 23)
Epoch 24 (LSTM(std), learning rate=0.0047, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5445, time: 118.68s
dev acc: 98.29%, precision: 93.96%, recall: 91.62%, F1: 92.77%
best dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88% (epoch: 23)
best test acc: 97.45%, precision: 89.12%, recall: 88.79%, F1: 88.96% (epoch: 23)
Epoch 25 (LSTM(std), learning rate=0.0045, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5255, time: 114.08s
dev acc: 98.30%, precision: 93.43%, recall: 92.17%, F1: 92.80%
best dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88% (epoch: 23)
best test acc: 97.45%, precision: 89.12%, recall: 88.79%, F1: 88.96% (epoch: 23)
Epoch 26 (LSTM(std), learning rate=0.0044, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5290, time: 113.38s
dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89%
best dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89% (epoch: 26)
best test acc: 97.52%, precision: 89.55%, recall: 88.95%, F1: 89.25% (epoch: 26)
Epoch 27 (LSTM(std), learning rate=0.0043, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4693, time: 111.95s
dev acc: 98.31%, precision: 93.06%, recall: 92.28%, F1: 92.67%
best dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89% (epoch: 26)
best test acc: 97.52%, precision: 89.55%, recall: 88.95%, F1: 89.25% (epoch: 26)
Epoch 28 (LSTM(std), learning rate=0.0043, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.3779, time: 105.43s
dev acc: 98.39%, precision: 93.44%, recall: 92.34%, F1: 92.89%
best dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89% (epoch: 26)
best test acc: 97.52%, precision: 89.55%, recall: 88.95%, F1: 89.25% (epoch: 26)
Epoch 29 (LSTM(std), learning rate=0.0042, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4463, time: 117.16s
dev acc: 98.38%, precision: 93.51%, recall: 92.33%, F1: 92.91%
best dev acc: 98.38%, precision: 93.51%, recall: 92.33%, F1: 92.91% (epoch: 29)
best test acc: 97.61%, precision: 89.99%, recall: 88.81%, F1: 89.40% (epoch: 29)
Epoch 30 (LSTM(std), learning rate=0.0041, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4345, time: 108.91s
dev acc: 98.33%, precision: 93.25%, recall: 92.28%, F1: 92.76%
best dev acc: 98.38%, precision: 93.51%, recall: 92.33%, F1: 92.91% (epoch: 29)
best test acc: 97.61%, precision: 89.99%, recall: 88.81%, F1: 89.40% (epoch: 29)
Epoch 31 (LSTM(std), learning rate=0.0040, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4096, time: 111.84s
dev acc: 98.40%, precision: 93.50%, recall: 92.53%, F1: 93.01%
best dev acc: 98.40%, precision: 93.50%, recall: 92.53%, F1: 93.01% (epoch: 31)
best test acc: 97.61%, precision: 90.14%, recall: 89.47%, F1: 89.80% (epoch: 31)
Epoch 32 (LSTM(std), learning rate=0.0039, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4046, time: 113.07s
dev acc: 98.39%, precision: 93.94%, recall: 92.31%, F1: 93.12%
best dev acc: 98.39%, precision: 93.94%, recall: 92.31%, F1: 93.12% (epoch: 32)
best test acc: 97.58%, precision: 90.38%, recall: 88.79%, F1: 89.58% (epoch: 32)
Epoch 33 (LSTM(std), learning rate=0.0038, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4126, time: 111.48s
dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32%
best dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32% (epoch: 33)
best test acc: 97.56%, precision: 89.93%, recall: 88.56%, F1: 89.24% (epoch: 33)
Epoch 34 (LSTM(std), learning rate=0.0038, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.3716, time: 107.51s
dev acc: 98.40%, precision: 93.87%, recall: 92.46%, F1: 93.16%
best dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32% (epoch: 33)
best test acc: 97.56%, precision: 89.93%, recall: 88.56%, F1: 89.24% (epoch: 33)
Epoch 35 (LSTM(std), learning rate=0.0037, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.3615, time: 116.80s
dev acc: 98.39%, precision: 93.65%, recall: 92.38%, F1: 93.01%
best dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32% (epoch: 33)
best test acc: 97.56%, precision: 89.93%, recall: 88.56%, F1: 89.24% (epoch: 33)

@pvcastro
Copy link

pvcastro commented Jun 7, 2018

Hi @XuezheMax!

Besides running the python 2 setup (with pytorch 3.1), I also ran the script mentioned in #9 to add indexes to the start of each line in my corpus, to eliminate the possibility that I maybe did something wrong when adapting the code to run without the indexes. The results I got were compatible to yours, I got to near 90% F1 score on the test dataset on only 10 epochs.

Then I got back to the pytorch4.0 branch with python 3, reverted the changes I made to disregard the starting indexes and ran the training on the corpus with starting indexes again, to see if I had succeeded because of the corpus or because of the python and pytorch versions, and I ended up getting those same low results again. So looks like there's something wrong with running pytorch 4.0 on python 3 🤔

I didn't test pytorch 4.0 with python 2.7, I'm guessing you already did that. What you probably didn't do was testing with python 3.6, right?

@ducalpha
Copy link

ducalpha commented Jun 7, 2018

Python 2.7 + Pytorch0.4 seems work well. My result on this config matches the paper. Running run_ner_crf.sh on CoNLL2003, I got F1 91.36% (better than the paper 91.21%) on epoch 167, but after that F1 reduced to 91.12%.

Epoch 167 (LSTM(std), learning rate=0.0011, decay rate=0.0500 (schedule=1)):
train: 937 loss: 0.7290, time: 31.23s
dev acc: 98.94%, precision: 94.79%, recall: 94.65%, F1: 94.72%
best dev acc: 98.94%, precision: 94.79%, recall: 94.65%, F1: 94.72% (epoch: 167)
best test acc: 98.14%, precision: 91.46%, recall: 91.25%, F1: 91.36% (epoch: 167)

@pvcastro
Copy link

pvcastro commented Jun 7, 2018

These reported results are usually averaged after some number of executions, it doesn't actually mean that their highest individual training was 91.21%.

So if you ran with 2.7 and pytorch 0.4, I'm inclined to think that the problem must be related to python 3 somehow 🤔

@pvcastro
Copy link

pvcastro commented Jun 7, 2018

@ducalpha did you use the pytorch4.0 branch, or did you use the master?

@ducalpha
Copy link

ducalpha commented Jun 7, 2018

I used the pytorch4.0 branch. The master branch yield an recursive stack exceeded error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants