-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix inconsistent decay of beta1 in Adam #1144
Conversation
Sorry for the extremely late reply! To be honest, I can't understand why in the current implementation in Blocks the first-order moment estimate is updated using |
Never mind. It's mentioned above Theorem 4.1 in the paper. I will close this pr, it would be better to rewrite the Adam algorithm (issue #1159 ). |
Oh, I see. I only read Algorithm 1 in which \beta_1 is not decayed. The other hyperparameter, \lambda, is only introduced in the Theorem 4.1. Rewriting Adam as #1159 would break a lot of existing code, but maybe we still should do it. I am not sure why you closed the PR, the change looks good to me now. |
Thanks! I closed this PR, because I think it may be out of date. |
I have synchronized the code, but the checks still failed. It says: I don't know why, is it related to the newer version of Theano? for example this commit. |
You are right, thanks for the heads-up! I created #1172 |
The tests pass, except for a clearly unrelated issue #1173 . The only thing I worried about is that the outputs did not even change in the test that covers Adam. Apparently that's because the impact of this change is negligible during the first iterations of training, when \lambda^t is almost one. But otherwise, the change seems legit to me, because what is called |
Thanks for your contribution, @SwordYork ! |
Thanks! It may be problematic when |
According to the Adam paper, the
beta1
that used to calculate stepsize should be the same with the one used to update biased first moment estimate. However, this bug won't cause a problem whendecay_factor
is very close to 1 ort1
is small.