Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adam optimize #1

Open
kikyou123 opened this issue May 20, 2016 · 7 comments
Open

adam optimize #1

kikyou123 opened this issue May 20, 2016 · 7 comments

Comments

@kikyou123
Copy link

in optimize_gan.py : function ADAM param l=1e-8, I wonder if it is wrong, because b1_t will become close to 0 .

@jiwoongim
Copy link
Owner

I think β1,t ← 1 − (1 − β1)λ^(t−1) becomes close 1 and make the momentum degenerate (β1=0.1 here).

This particular ADAM code was based on version 2 of the paper, which had β1,t ← 1 − (1 − β1)λ^(t−1).
I notice that the current version of ADAM paper doesn't seem to have β1,t ← 1 − (1 − β1)λ^(t−1) but rather fixes β1,t=0.9.

@kikyou123
Copy link
Author

but I think λ should be 1-1e-8 not be 1e-8, why the momentum should degenerate.
and in lsun dataset I fould both loss will become very small, so the train will be failed, I don not know why.

@cdjkim
Copy link
Collaborator

cdjkim commented May 20, 2016

hi,

I have just updated the repostiory, to merge everything into gran.py. That
probably wasn't the problem, but feel free to pull the current one and give
it a try.

Also, make sure the full path to the data is given correctly and ends with
something like,

dataset = ',,/,,/,,/preprocessed_100/'

also could you let me know what epoch does it fail? (if its at 0 onwards, its likely a path problem I think) and did you print out the samples at every epoch to see whether the samples make sense?

could you check if it works on CIfar10? because it might be the preprocessing part that is causing the problem.

we also tried on LSUN "living room and kitchen" dataset and it works fine, we will upload the samples shortly. :)

Chris

On 20 May 2016 at 11:00, houruibing notifications@github.com wrote:

but I think λ should be 1-1e-8 not be 1e-8, why the momentum should
degenerate.
and in lsun dataset I fould both loss will become very small, so the train
will be failed, I don not know why.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1 (comment)

@kikyou123
Copy link
Author

it work fine in cifar10, in lsun epoch 0 it will failed.

@cdjkim
Copy link
Collaborator

cdjkim commented May 21, 2016

hi :) can you show me what you get ?

On 20 May 2016 at 21:30, houruibing notifications@github.com wrote:

it work fine in cifar10, in lsun epoch 0 it will failed.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#1 (comment)

@kikyou123
Copy link
Author

image

it is epoch 0 ,and i also found in cifar10 when I set b1=0 it will work ,and when I use this update algorithm, it will failed.

`class Adam(Update):

def __init__(self, lr=0.001, b1=0.9, b2=0.999, e=1e-8, l=1-1e-8, *args, **kwargs):
    Update.__init__(self, *args, **kwargs)
    self.__dict__.update(locals())  

def __call__(self, params, cost):
    updates = []
    grads = T.grad(cost, params)
    #grads = clip_norms(grads, self.clipnorm)
    t = theano.shared(floatX(1.))
    b1_t = self.b1*self.l**(t-1)

    for p, g in zip(params, grads):
        g = self.regularizer.gradient_regularize(p, g)
        m = theano.shared(p.get_value() * 0.)
        v = theano.shared(p.get_value() * 0.)

        m_t = b1_t*m + (1 - b1_t)*g
        v_t = self.b2*v + (1 - self.b2)*g**2
        m_c = m_t / (1-self.b1**t)
        v_c = v_t / (1-self.b2**t)
        p_t = p - (self.lr * m_c) / (T.sqrt(v_c) + self.e)
        p_t = self.regularizer.weight_regularize(p_t)
        updates.append((m, m_t))
        updates.append((v, v_t))
        updates.append((p, p_t) )
    updates.append((t, t + 1.))
    return updates`

@jiwoongim
Copy link
Owner

I don't think it is optimizer's problem, because ours work fine.. I suspect that the reason might be due to hyper-parameter tuning. Our pre-processed version of LSUN churches, living room + kitchen works fine. As you said GRAN on cifar10 works well, so maybe it is not optimization method. If you strongly believe that the problem comes from optimizer, then you could also try with different optimization methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants