Is the CNN version code trained well? #2

shimazing · 2019-05-17T02:38:28Z

Your code is really helpful to understand how iResNet works.
Thanks for writing this code.
However, when I was trying to run the CNN version code jupyter notebook,
It gave me the wrong result on the evaluation phase (when activating evaluation mode with net.eval()) such that after a few iterations, the model even cannot reconstruct the inputs and the latent standard of test data diverges. (I am using DataParallel and Do u think the problem comes from this?)

Did you get the right result??

Thanks in advance for your reply

jarrelscy · 2019-05-17T03:39:25Z

Hi Thanks for kind comments. The model I have trains correctly but slowly. Did you alter any of the parameters including the batch size? Also the quality of the generated images are poor without training for long (200 epochs in the original paper) even though visual inspection of some of the dimensions seem correct. Jarrel

…

On Fri., 17 May 2019, 12:38 shimazing, ***@***.***> wrote: Your code is really helpful to understand how iResNet works. Thanks for writing this code. However, when I was trying to run the CNN version code jupyter notebook, It gave me the wrong result such that after a few iterations, the model even can not reconstruct the inputs and latent standard of test data diverges. Did you get the right result?? Thanks in advance for your reply — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AAOEUBYX3N3U5JHJYJNY6ILPVYLCLA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUJZWRQ>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOEUB2HAJZI33V635WE6LTPVYLCLANCNFSM4HNRS6SQ> .

shimazing · 2019-05-17T05:34:21Z

My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight in a test phase. I think this is not a correct action.

And one more question is that does this code really do an in-place update for u? What do u mean my "v" in comment under def compute_weight(self, module, do_power_iteration, num_iter=0): in SpectralNormGouk.py file

Hajin Shim

jarrelscy · 2019-05-17T06:00:43Z

Actually you are on the right path. I have just checked the code and it uses an older incorrect version of the Spectral Normalization by Gouk. Specifically it underestimates the largest singular value because I use too small an x_i

https://arxiv.org/pdf/1804.04368.pdf for my future reference.

I actually made this change a while back but did not upload the corrected file, so my apologies.

p.s. you can ignore the comment under compute_weight this was copied from an earlier implementation of Miyato's spectral norm using u and v vectors

As to whether using sigma calculated in the training phase is valid in the testing phase, that is a good point. In theory the weight shouldn't be changing during the testing phase (since no weight update is performed) and sigma is solely dependent on the weight, so that shouldn't also change.

In practice the sigma is somewhat variable, as the power iteration method only gives a bounded estimate, so I'm unclear whether recalculating sigma during the testing phase will change the result.

Try the updated version and see if this works first.

shimazing · 2019-05-17T07:05:44Z

Thanks for updating!! :)

However, I still have a problem and have a question.
what is "weight_orig" parameter for?
With assertion check, I've noticed that weight and weight_orig have different value but we use weight_orig to calculate sigma. When is this weight_orig updated to reflect the current state? I wonder this is the right way.

Thanks again for your fast reply :)

jarrelscy · 2019-05-17T07:15:25Z

Weight_orig is the original weight and the actual parameter that is undergoes gradient descent
The spectral norm replaces the weight parameter with a torch tensor which is recomputed everytime gradient descent happens.

This is the same approach used in the pytorch implementation of Miyato's spectral_norm (in fact it is shamelessly copied including comments...)

https://pytorch.org/docs/stable/_modules/torch/nn/utils/spectral_norm.html

So when the Conv2d runs, it requests module.weight which is the recomputed tensor.

When gradient descent runs and weight_orig is altered, weight is recomputed by finding the sigma of weight_orig and dividing it by the sigma if it is larger than 1.

lingzenan · 2019-05-17T09:04:23Z

My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight in a test phase. I think this is not a correct action.

And one more question is that does this code really do an in-place update for u? What do u mean my "v" in comment under def compute_weight(self, module, do_power_iteration, num_iter=0): in SpectralNormGouk.py file

Hajin Shim

Actually you are on the right path. I have just checked the code and it uses an older incorrect version of the Spectral Normalization by Gouk. Specifically it underestimates the largest singular value because I use too small an x_i

https://arxiv.org/pdf/1804.04368.pdf for my future reference.

I actually made this change a while back but did not upload the corrected file, so my apologies.

p.s. you can ignore the comment under compute_weight this was copied from an earlier implementation of Miyato's spectral norm using u and v vectors

As to whether using sigma calculated in the training phase is valid in the testing phase, that is a good point. In theory the weight shouldn't be changing during the testing phase (since no weight update is performed) and sigma is solely dependent on the weight, so that shouldn't also change.

In practice the sigma is somewhat variable, as the power iteration method only gives a bounded estimate, so I'm unclear whether recalculating sigma during the testing phase will change the result.

Try the updated version and see if this works first.

I met the similar problem. I am writing the classification code based on the "SpectralNormGouk.py" file. However, the test loss increased and the test accuracy decreased to around 10% while the training loss and accuracy performed well. Besides, when I checked the trained model , i.e., load the state dict, the loss differed a lot from the values printed during the training.

shimazing · 2019-05-17T10:34:02Z

@lingzenan Do you run the code with DataParallel??

shimazing · 2019-05-17T10:37:25Z

@jarrelscy I still have a problem even with the updated version. Have you run the code with DataParallel?

lingzenan · 2019-05-17T10:55:19Z

@shimazing yes

jarrelscy · 2019-05-17T11:18:41Z

I did not use data parallel. I'm not sure how the actnorm would behave with data parallel, you may have to run a test batch before copying the model to other gpus or the batch statistics may be wrong.

…

On Fri., 17 May 2019, 20:55 Zenan Ling, ***@***.***> wrote: @shimazing <https://github.com/shimazing> yes — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AAOEUB4IXKRIXAIGA3FQ5V3PV2FJRA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUODIY#issuecomment-493412771>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOEUB5QU7JLJFJ2BH7AJ6TPV2FJRANCNFSM4HNRS6SQ> .

lingzenan · 2019-05-20T11:15:03Z

@jarrelscy Problems still exit without data parallel. Here is a toy example.

import torch.nn as nn
from SpectralNormGouk1 import *
from torch.optim import *

class toy(nn.Module):
def init(self):
super(toy, self).init()
self.f = spectral_norm(nn.Linear(10, 10, bias=False), magnitude=0.9, n_power_iterations=5)

def forward(self, x):
    x = self.f(x)
    return x

if name == "main":

net = toy()

opt = Adam(net.parameters(), lr=0.01)
criterion = nn.MSELoss()
for i in range(1000):
    net.train()
    opt.zero_grad()
    inputs = torch.ones(32, 10)
    y = net(inputs)
    loss = criterion(y, inputs)
    loss.backward()
    opt.step()
    print(loss.item())
torch.save(net.state_dict(), 'check.pkl')

print("########eval###########")
net.eval()
with torch.no_grad():
    inputs = torch.ones(32, 10)
    y = net(inputs)
    loss = criterion(y, inputs)
    print(loss.item())
print("########eval_check###########")
net_ = toy()
state = torch.load('check.pkl')
net_.load_state_dict(state)
net_.eval()
with torch.no_grad():
    inputs = torch.ones(32, 10)
    y = net_(inputs)
    loss = criterion(y, inputs)
    print(loss.item())

"AttributeError: 'Linear' object has no attribute 'sigma' "

jarrelscy · 2019-05-20T11:26:12Z

Hi Zenan, Saving and loading is not implemented yet. Jarrel

…

On Mon., 20 May 2019, 07:15 Zenan Ling, ***@***.***> wrote: @jarrelscy <https://github.com/jarrelscy> Problems still exit without data parallel. Here is a toy example. import torch.nn as nn from SpectralNormGouk1 import * from torch.optim import * class toy(nn.Module): def *init*(self): super(toy, self).*init*() self.f = spectral_norm(nn.Linear(10, 10, bias=False), magnitude=0.9, n_power_iterations=5) def forward(self, x): x = self.f(x) return x if *name* == "*main*": net = toy() opt = Adam(net.parameters(), lr=0.01) criterion = nn.MSELoss() for i in range(1000): net.train() opt.zero_grad() inputs = torch.ones(32, 10) y = net(inputs) loss = criterion(y, inputs) loss.backward() opt.step() print(loss.item()) torch.save(net.state_dict(), 'check.pkl') print("########eval###########") net.eval() with torch.no_grad(): inputs = torch.ones(32, 10) y = net(inputs) loss = criterion(y, inputs) print(loss.item()) print("########eval_check###########") net_ = toy() state = torch.load('check.pkl') net_.load_state_dict(state) net_.eval() with torch.no_grad(): inputs = torch.ones(32, 10) y = net_(inputs) loss = criterion(y, inputs) print(loss.item()) "AttributeError: 'Linear' object has no attribute 'sigma' " — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AAOEUB3VEDXK4CHAIZGOEZ3PWKB3PA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVYPVFI#issuecomment-493943445>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOEUB4W4UL5TORI4R4VSGDPWKB3PANCNFSM4HNRS6SQ> .

lingzenan · 2019-05-20T13:21:50Z

@jarrelscy Thanks for your reply.

lingzenan · 2019-05-21T07:32:16Z

@jarrelscy The test loss and accuracy seem to be normal if I use "net.train()" and "with with torch.no_grad()" during the test phase.

jarrelscy · 2019-05-21T09:53:33Z

Interesting maybe we should be recalculating sigma during test time then

…

On Tue., 21 May 2019, 03:32 Zenan Ling, ***@***.***> wrote: @jarrelscy <https://github.com/jarrelscy> The test loss and accuracy seem to be normal if I use "net.train()" and "with with torch.no_grad()" during the test phase. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AAOEUB33HBT6FTKD5QQ4MELPWOQQBA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV3APMI#issuecomment-494274481>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOEUB6JIUDBFWZFKVDVZCTPWOQQBANCNFSM4HNRS6SQ> .

shimazing · 2019-05-30T04:11:23Z

I also found it runs correctly with a single gpu and net.train() under torch.no_grad().

lingzenan · 2019-05-30T04:17:14Z

@shimazing @jarrelscy did you train the classification model？The author release the code in the latest version paper but the link is 404 now.

lingzenan · 2019-05-30T04:21:36Z

my classification net doesn’t work on single gpu the loss explodes

jarrelscy · 2019-05-30T08:07:06Z

I have yet to train the classification model.

…

On Thu., 30 May 2019, 06:21 Zenan Ling, ***@***.***> wrote: my classification net doesn’t work on single gpu the loss explodes — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AAOEUB5KOD3HLNLFCRBPYFLPX5I5BA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWRJ6IQ#issuecomment-497196834>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOEUB7TIZ65XRL4YWJHI4DPX5I5BANCNFSM4HNRS6SQ> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the CNN version code trained well? #2

Is the CNN version code trained well? #2

shimazing commented May 17, 2019 •

edited

Loading

jarrelscy commented May 17, 2019 via email

shimazing commented May 17, 2019 •

edited

Loading

jarrelscy commented May 17, 2019

shimazing commented May 17, 2019 •

edited

Loading

jarrelscy commented May 17, 2019

lingzenan commented May 17, 2019

shimazing commented May 17, 2019

shimazing commented May 17, 2019

lingzenan commented May 17, 2019

jarrelscy commented May 17, 2019 via email

lingzenan commented May 20, 2019

jarrelscy commented May 20, 2019 via email

lingzenan commented May 20, 2019

lingzenan commented May 21, 2019

jarrelscy commented May 21, 2019 via email

shimazing commented May 30, 2019 •

edited

Loading

lingzenan commented May 30, 2019

lingzenan commented May 30, 2019

jarrelscy commented May 30, 2019 via email

Is the CNN version code trained well? #2

Is the CNN version code trained well? #2

Comments

shimazing commented May 17, 2019 • edited Loading

jarrelscy commented May 17, 2019 via email

shimazing commented May 17, 2019 • edited Loading

jarrelscy commented May 17, 2019

shimazing commented May 17, 2019 • edited Loading

jarrelscy commented May 17, 2019

lingzenan commented May 17, 2019

shimazing commented May 17, 2019

shimazing commented May 17, 2019

lingzenan commented May 17, 2019

jarrelscy commented May 17, 2019 via email

lingzenan commented May 20, 2019

jarrelscy commented May 20, 2019 via email

lingzenan commented May 20, 2019

lingzenan commented May 21, 2019

jarrelscy commented May 21, 2019 via email

shimazing commented May 30, 2019 • edited Loading

lingzenan commented May 30, 2019

lingzenan commented May 30, 2019

jarrelscy commented May 30, 2019 via email

shimazing commented May 17, 2019 •

edited

Loading

shimazing commented May 17, 2019 •

edited

Loading

shimazing commented May 17, 2019 •

edited

Loading

shimazing commented May 30, 2019 •

edited

Loading