-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13
Comments
Hi,
The hyper-parameters seems reasonable, but the results are surprisingly
low. I used standard train/dev/test data in CoNLL-2003.
I am not familiar with Pytorch in windows, but I guess you need to use
pytorch0.4, right?
In this case, please switch to branch 'pytorch4.0'
…On Mon, Apr 30, 2018 at 3:34 PM, Ayrton Denner ***@***.***> wrote:
Hello
I am trying to achieve the same results as "End-to-end Sequence Labeling
via Bi-directional LSTM-CNNs-CRF" paper, but it doesn't seem to match the
results that the paper claims to have after 50 epochs. I've also read
XuezheMax/NeuroNLP2#8 <#8>
issue.
Because I'm using Windows, I got the hyper-parameters off the .sh script
and wrote them direct into the NERCRF.py code.
[image: image]
<https://user-images.githubusercontent.com/13112588/39445920-03eb5880-4c93-11e8-90e2-cb73ad5f355e.png>
After 50 epochs, using the GloVe embeddings with 100 dimensions and
CoNLL-2003 corpus (which I downloaded from this repository
<https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003>), I've only
managed a 84.76% F1 score in my dev data and a 80.32% F1 score in my test
data. Are the hyper-parameters rights? Did you use eng.testa for dev data
and eng.testb for test data, or did you used different files? Should I pay
attention to anything else?
Thanks.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#13>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADUtljuPdNndXkvtbZGVFJfjCsb_v-ptks5tt2dbgaJpZM4TtGiE>
.
--
------------------
Best regards,
Ma,Xuezhe
Language Technologies Institute,
School of Computer Science,
Carnegie Mellon University
Tel: +1 206-512-5977
|
Hello. I'm actually using 0.3.1.post2 of PyTorch. Should I update it to 0.4? Could a different version produce a different performance outcome as well? Seems weird... |
No, I just make sure that you used the correct version because there are
some major changes from pytorch0.3 to 0.4 which may cause some wired issues.
…On Mon, Apr 30, 2018 at 4:38 PM, Ayrton Denner ***@***.***> wrote:
Hello. I'm actually using 0.3.1.post2 of PyTorch. Should I update it to
0.4? Could a different version produce a different performance outcome as
well? Seems weird...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADUtlm1TMKAb14yrYFHYNYSf4avmmivOks5tt3ZYgaJpZM4TtGiE>
.
--
------------------
Best regards,
Ma,Xuezhe
Language Technologies Institute,
School of Computer Science,
Carnegie Mellon University
Tel: +1 206-512-5977
|
Hi @XuezheMax, I'm also running the run_ner_crf script and I'm having problems getting to the results reported in your paper. I'm getting results similar to the ones @ayrtondenner got.
What could be wrong? Thanks! |
Hi, |
I see, I noticed that the annotation scheme is really messed up. The LSTM-CRF from Lample fixes this in memory, but the training file is the same, that's why it doesn't matter for his code. |
Here is the code I used to convert it to BIO def transform(ifile, ofile):
with open(ifile, 'r') as reader, open(ofile, 'w') as writer:
prev = 'O'
for line in reader:
line = line.strip()
if len(line) == 0:
prev = 'O'
writer.write('\n')
continue
tokens = line.split()
# print tokens
label = tokens[-1]
if label != 'O' and label != prev:
if prev == 'O':
label = 'B-' + label[2:]
elif label[2:] != prev[2:]:
label = 'B-' + label[2:]
else:
label = label
writer.write(" ".join(tokens[:-1]) + " " + label)
writer.write('\n')
prev = tokens[-1] |
Great, thanks @XuezheMax ! |
Strangely, it doesn't seem to have made any difference 🤔 |
Yes, I am sure that using the exact parameters in run_ner_crf.sh should give around 91% F1 score on test set. |
Would you please paste your log here so I can check the possible issues. |
Yes, I did remove the alphabets folder 👍 I'm running a new training now with the latest adjustments. Fixed another place in the code that was referring to the word token with the wrong index (after removing the starting numbers). Here's the log so far: /home/pedro/virtualenv/pytorch/bin/python /home/pedro/pycharm-community-2017.3.2/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 37531 --file /home/pedro/repositorios/NeuroNLP2/examples/NERCRF.py --cuda --mode LSTM --num_epochs 200 --batch_size 16 --hidden_size 256 --num_layers 1 --char_dim 30 --num_filters 30 --tag_space 128 --learning_rate 0.01 --decay_rate 0.05 --schedule 1 --gamma 0.0 --dropout std --p_in 0.33 --p_rnn 0.33 0.5 --p_out 0.5 --unk_replace 0.0 --bigram --embedding glove --embedding_dict /media/discoD/embeddings/English/Glove/glove.6B/glove.6B.100d.gz --train data/conll2003/english/eng.train.bios --dev data/conll2003/english/eng.testa.bios --test data/conll2003/english/eng.testb.bios loading embedding: glove from /media/discoD/embeddings/English/Glove/glove.6B/glove.6B.100d.gz |
Here is my log. You are using python 3.6, right? what is your pytorch version? |
Yes, I'm running Anaconda 4.5.1 with python 3.6.3, pytorch 0.4.0 (using your pytorch0.4 branch) and gensim 3.4.0. |
FYI. here is the first 35 epochs for python 2.7 with pytorch 0.4. I seems it converges slower than pytorch 0.3. But still approaches 90% F1 after 35 epochs. |
Hi @XuezheMax! Besides running the python 2 setup (with pytorch 3.1), I also ran the script mentioned in #9 to add indexes to the start of each line in my corpus, to eliminate the possibility that I maybe did something wrong when adapting the code to run without the indexes. The results I got were compatible to yours, I got to near 90% F1 score on the test dataset on only 10 epochs. Then I got back to the pytorch4.0 branch with python 3, reverted the changes I made to disregard the starting indexes and ran the training on the corpus with starting indexes again, to see if I had succeeded because of the corpus or because of the python and pytorch versions, and I ended up getting those same low results again. So looks like there's something wrong with running pytorch 4.0 on python 3 🤔 I didn't test pytorch 4.0 with python 2.7, I'm guessing you already did that. What you probably didn't do was testing with python 3.6, right? |
Python 2.7 + Pytorch0.4 seems work well. My result on this config matches the paper. Running
|
These reported results are usually averaged after some number of executions, it doesn't actually mean that their highest individual training was 91.21%. So if you ran with 2.7 and pytorch 0.4, I'm inclined to think that the problem must be related to python 3 somehow 🤔 |
@ducalpha did you use the pytorch4.0 branch, or did you use the master? |
I used the pytorch4.0 branch. The master branch yield an recursive stack exceeded error. |
Hello
I am trying to achieve the same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper, but it doesn't seem to match the results that the paper claims to have after 50 epochs. I've also read #8 issue.
Because I'm using Windows, I got the hyper-parameters off the .sh script and wrote them direct into the NERCRF.py code.
After 50 epochs, using the GloVe embeddings with 100 dimensions and CoNLL-2003 corpus (which I downloaded from this repository), I've only managed a 84.76% F1 score in my dev data and a 80.32% F1 score in my test data. Are the hyper-parameters rights? Did you use eng.testa for dev data and eng.testb for test data, or did you used different files? Should I pay attention to anything else?
Thanks.
The text was updated successfully, but these errors were encountered: