Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix inconsistent decay of beta1 in Adam #1144

Merged
merged 1 commit into from
Jan 4, 2017
Merged

Conversation

SwordYork
Copy link
Contributor

According to the Adam paper, the beta1 that used to calculate stepsize should be the same with the one used to update biased first moment estimate. However, this bug won't cause a problem when decay_factor is very close to 1 or t1 is small.

@SwordYork SwordYork changed the title fix inconsistency decay of beta1 in Adam fix inconsistent decay of beta1 in Adam Sep 7, 2016
@rizar
Copy link
Contributor

rizar commented Jan 2, 2017

Sorry for the extremely late reply! To be honest, I can't understand why in the current implementation in Blocks the first-order moment estimate is updated using \beta_1^t It seem like in the paper just \beta_1 is used. Do you have an idea why it's the case?

@SwordYork
Copy link
Contributor Author

Never mind. It's mentioned above Theorem 4.1 in the paper.
first moment running average coefcient β1,t decay exponentially with λ, that is typically closeto 1, e.g. 1−10−8.

I will close this pr, it would be better to rewrite the Adam algorithm (issue #1159 ).

@SwordYork SwordYork closed this Jan 2, 2017
@rizar
Copy link
Contributor

rizar commented Jan 2, 2017

Oh, I see. I only read Algorithm 1 in which \beta_1 is not decayed. The other hyperparameter, \lambda, is only introduced in the Theorem 4.1.

Rewriting Adam as #1159 would break a lot of existing code, but maybe we still should do it.

I am not sure why you closed the PR, the change looks good to me now.

@rizar rizar reopened this Jan 2, 2017
@SwordYork
Copy link
Contributor Author

Thanks! I closed this PR, because I think it may be out of date.
Would you please fix beta_1t, I don't have a suitable device to update the code right now.

@SwordYork SwordYork reopened this Jan 3, 2017
@SwordYork
Copy link
Contributor Author

I have synchronized the code, but the checks still failed.

It says:
FAIL: tests.bricks.test_conv.FunctionTestCase (test_untied_biases) AssertionError: AbstractConv shape mismatch: shape of image does not match given imshp.
but in test_conv.py, it is
assert_raises_regexp(ValueError, 'Input dimension mis-match.*', wrongsize).

I don't know why, is it related to the newer version of Theano? for example this commit.
I think test_conv.py should be modified.

@rizar
Copy link
Contributor

rizar commented Jan 4, 2017

You are right, thanks for the heads-up! I created #1172

@rizar
Copy link
Contributor

rizar commented Jan 4, 2017

The tests pass, except for a clearly unrelated issue #1173 . The only thing I worried about is that the outputs did not even change in the test that covers Adam. Apparently that's because the impact of this change is negligible during the first iterations of training, when \lambda^t is almost one. But otherwise, the change seems legit to me, because what is called learning_rate in the code is in fact simply a trick to speed things up a bit, as mentioned in the second last paragraph of Page 2 of the paper.

@rizar rizar merged commit e89cf93 into mila-iqia:master Jan 4, 2017
@rizar
Copy link
Contributor

rizar commented Jan 4, 2017

Thanks for your contribution, @SwordYork !

@SwordYork
Copy link
Contributor Author

Thanks! It may be problematic when decay_factor is not close to 1 and t1 is large (for example, (1-10**-5)**10000 = 0.9048), because after changing the order of computation of Algorithm 1, \alpha_t should become \alpha_t = \alpha \sqrt{1-\beta_2^t} / (1-\beta_{1,t}^t) instead of \alpha_t = \alpha \sqrt{1-\beta_2^t} / (1-\beta_1^t) (previous code).
Nevertheless, I think it is rare to change decay_factor to such value.
Thanks for this wonderful framework!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants