-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compare with the code in the original paper #13
Comments
I'm confused about why update word embeddings can cause performance decrease, may be some comparison experiment should be done with original lua code. |
Hi @wq343580510 and @ryh95 Got stuck with some other work, did not have time to reply to you guys! So one of the major differences is that in the original implementation, the embeddings are updated using plain SGD, while the rest of the model is updated using the optimizer. In my implementation, however, I use the same optimizer to update the entire model including the embeddings. This can potentially lead to slight differences across batches and epochs, resulting in different performance. In the original implementation, this line updates word embeddings, while this line updates the remaining parameters. Of course, other minor differences might still be present, such as differences in weight initialization. I have tried to use the hyperparameters as specified in the paper, but it is quite possible I missed something! I was trying to implement this model to learn more about dynamic neural networks in PyTorch, and the minor gap in results seemed acceptable to me at the time, so I never investigated further. |
Hi @wq343580510 and @ryh95 , The original implementation deals with word embeddings by updating them separately using plain SGD, as demonstrated by these lines: However, as pointed out in another issue #17, the text of the original paper mentions freezing the word embeddings, which I have since incorporated as of this commit which adds the option of freezing the word embeddings during training. This results in a slight improvement to the metrics, and we can now reach Pearson's coefficient of Since we are now within |
Can you please share the arguments namespace for which you got Pearson's coefficient as 0.8674 and MSE as 0.2536 ? In how many epochs did you achieve this? Was the random seed '123' ? Sorry, I am unable to reproduce the your exact result with this code. Any little help will be appreciated! Thank you in advance |
The difference might be because of differences in the way the word embeddings are updated.
Can you make it more specific?
The text was updated successfully, but these errors were encountered: