compare with the code in the original paper #13

wq343580510 · 2017-09-15T07:30:13Z

The difference might be because of differences in the way the word embeddings are updated.
Can you make it more specific?

ryh95 · 2017-10-24T03:08:01Z

I'm confused about why update word embeddings can cause performance decrease, may be some comparison experiment should be done with original lua code.

dasguptar · 2017-11-08T06:44:21Z

Hi @wq343580510 and @ryh95

Got stuck with some other work, did not have time to reply to you guys!

So one of the major differences is that in the original implementation, the embeddings are updated using plain SGD, while the rest of the model is updated using the optimizer. In my implementation, however, I use the same optimizer to update the entire model including the embeddings. This can potentially lead to slight differences across batches and epochs, resulting in different performance.

In the original implementation, this line updates word embeddings, while this line updates the remaining parameters.

Of course, other minor differences might still be present, such as differences in weight initialization. I have tried to use the hyperparameters as specified in the paper, but it is quite possible I missed something! I was trying to implement this model to learn more about dynamic neural networks in PyTorch, and the minor gap in results seemed acceptable to me at the time, so I never investigated further.

dasguptar · 2017-11-28T11:26:11Z

Hi @wq343580510 and @ryh95 ,

The original implementation deals with word embeddings by updating them separately using plain SGD, as demonstrated by these lines:

However, as pointed out in another issue #17, the text of the original paper mentions freezing the word embeddings, which I have since incorporated as of this commit which adds the option of freezing the word embeddings during training. This results in a slight improvement to the metrics, and we can now reach Pearson's coefficient of 0.8674 and MSE of 0.2536.

Since we are now within ~0.0005 of the original paper, albeit with a different learning rate, I don't think I would be spending any more time optimizing this further. However, any contributions (PRs/bug fixes) are always welcome and will be merged. 🙂

jeenakk · 2019-06-23T15:23:02Z

Can you please share the arguments namespace for which you got Pearson's coefficient as 0.8674 and MSE as 0.2536 ? In how many epochs did you achieve this? Was the random seed '123' ? Sorry, I am unable to reproduce the your exact result with this code. Any little help will be appreciated! Thank you in advance

dasguptar closed this as completed Nov 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compare with the code in the original paper #13

compare with the code in the original paper #13

wq343580510 commented Sep 15, 2017

ryh95 commented Oct 24, 2017

dasguptar commented Nov 8, 2017

dasguptar commented Nov 28, 2017 •

edited

Loading

jeenakk commented Jun 23, 2019

compare with the code in the original paper #13

compare with the code in the original paper #13

Comments

wq343580510 commented Sep 15, 2017

ryh95 commented Oct 24, 2017

dasguptar commented Nov 8, 2017

dasguptar commented Nov 28, 2017 • edited Loading

jeenakk commented Jun 23, 2019

dasguptar commented Nov 28, 2017 •

edited

Loading