Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the CNN version code trained well? #2

Open
shimazing opened this issue May 17, 2019 · 19 comments
Open

Is the CNN version code trained well? #2

shimazing opened this issue May 17, 2019 · 19 comments

Comments

@shimazing
Copy link

shimazing commented May 17, 2019

Your code is really helpful to understand how iResNet works.
Thanks for writing this code.
However, when I was trying to run the CNN version code jupyter notebook,
It gave me the wrong result on the evaluation phase (when activating evaluation mode with net.eval()) such that after a few iterations, the model even cannot reconstruct the inputs and the latent standard of test data diverges. (I am using DataParallel and Do u think the problem comes from this?)

Did you get the right result??

Thanks in advance for your reply

@jarrelscy
Copy link
Owner

jarrelscy commented May 17, 2019 via email

@shimazing
Copy link
Author

shimazing commented May 17, 2019

My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight in a test phase. I think this is not a correct action.

And one more question is that does this code really do an in-place update for u? What do u mean my "v" in comment under def compute_weight(self, module, do_power_iteration, num_iter=0): in SpectralNormGouk.py file

Hajin Shim

@jarrelscy
Copy link
Owner

Actually you are on the right path. I have just checked the code and it uses an older incorrect version of the Spectral Normalization by Gouk. Specifically it underestimates the largest singular value because I use too small an x_i

https://arxiv.org/pdf/1804.04368.pdf for my future reference.

I actually made this change a while back but did not upload the corrected file, so my apologies.

p.s. you can ignore the comment under compute_weight this was copied from an earlier implementation of Miyato's spectral norm using u and v vectors

As to whether using sigma calculated in the training phase is valid in the testing phase, that is a good point. In theory the weight shouldn't be changing during the testing phase (since no weight update is performed) and sigma is solely dependent on the weight, so that shouldn't also change.

In practice the sigma is somewhat variable, as the power iteration method only gives a bounded estimate, so I'm unclear whether recalculating sigma during the testing phase will change the result.

Try the updated version and see if this works first.

@shimazing
Copy link
Author

shimazing commented May 17, 2019

Thanks for updating!! :)

However, I still have a problem and have a question.
what is "weight_orig" parameter for?
With assertion check, I've noticed that weight and weight_orig have different value but we use weight_orig to calculate sigma. When is this weight_orig updated to reflect the current state? I wonder this is the right way.

Thanks again for your fast reply :)

@jarrelscy
Copy link
Owner

Weight_orig is the original weight and the actual parameter that is undergoes gradient descent
The spectral norm replaces the weight parameter with a torch tensor which is recomputed everytime gradient descent happens.

This is the same approach used in the pytorch implementation of Miyato's spectral_norm (in fact it is shamelessly copied including comments...)

https://pytorch.org/docs/stable/_modules/torch/nn/utils/spectral_norm.html

So when the Conv2d runs, it requests module.weight which is the recomputed tensor.

When gradient descent runs and weight_orig is altered, weight is recomputed by finding the sigma of weight_orig and dividing it by the sigma if it is larger than 1.

@lingzenan
Copy link

My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight in a test phase. I think this is not a correct action.

And one more question is that does this code really do an in-place update for u? What do u mean my "v" in comment under def compute_weight(self, module, do_power_iteration, num_iter=0): in SpectralNormGouk.py file

Hajin Shim

Actually you are on the right path. I have just checked the code and it uses an older incorrect version of the Spectral Normalization by Gouk. Specifically it underestimates the largest singular value because I use too small an x_i

https://arxiv.org/pdf/1804.04368.pdf for my future reference.

I actually made this change a while back but did not upload the corrected file, so my apologies.

p.s. you can ignore the comment under compute_weight this was copied from an earlier implementation of Miyato's spectral norm using u and v vectors

As to whether using sigma calculated in the training phase is valid in the testing phase, that is a good point. In theory the weight shouldn't be changing during the testing phase (since no weight update is performed) and sigma is solely dependent on the weight, so that shouldn't also change.

In practice the sigma is somewhat variable, as the power iteration method only gives a bounded estimate, so I'm unclear whether recalculating sigma during the testing phase will change the result.

Try the updated version and see if this works first.

I met the similar problem. I am writing the classification code based on the "SpectralNormGouk.py" file. However, the test loss increased and the test accuracy decreased to around 10% while the training loss and accuracy performed well. Besides, when I checked the trained model , i.e., load the state dict, the loss differed a lot from the values printed during the training.

@shimazing
Copy link
Author

@lingzenan Do you run the code with DataParallel??

@shimazing
Copy link
Author

@jarrelscy I still have a problem even with the updated version. Have you run the code with DataParallel?

@lingzenan
Copy link

@shimazing yes

@jarrelscy
Copy link
Owner

jarrelscy commented May 17, 2019 via email

@lingzenan
Copy link

@jarrelscy Problems still exit without data parallel. Here is a toy example.

import torch.nn as nn
from SpectralNormGouk1 import *
from torch.optim import *

class toy(nn.Module):
def init(self):
super(toy, self).init()
self.f = spectral_norm(nn.Linear(10, 10, bias=False), magnitude=0.9, n_power_iterations=5)

def forward(self, x):
    x = self.f(x)
    return x

if name == "main":

net = toy()

opt = Adam(net.parameters(), lr=0.01)
criterion = nn.MSELoss()
for i in range(1000):
    net.train()
    opt.zero_grad()
    inputs = torch.ones(32, 10)
    y = net(inputs)
    loss = criterion(y, inputs)
    loss.backward()
    opt.step()
    print(loss.item())
torch.save(net.state_dict(), 'check.pkl')

print("########eval###########")
net.eval()
with torch.no_grad():
    inputs = torch.ones(32, 10)
    y = net(inputs)
    loss = criterion(y, inputs)
    print(loss.item())
print("########eval_check###########")
net_ = toy()
state = torch.load('check.pkl')
net_.load_state_dict(state)
net_.eval()
with torch.no_grad():
    inputs = torch.ones(32, 10)
    y = net_(inputs)
    loss = criterion(y, inputs)
    print(loss.item())

"AttributeError: 'Linear' object has no attribute 'sigma' "

@jarrelscy
Copy link
Owner

jarrelscy commented May 20, 2019 via email

@lingzenan
Copy link

@jarrelscy Thanks for your reply.

@lingzenan
Copy link

@jarrelscy The test loss and accuracy seem to be normal if I use "net.train()" and "with with torch.no_grad()" during the test phase.

@jarrelscy
Copy link
Owner

jarrelscy commented May 21, 2019 via email

@shimazing
Copy link
Author

shimazing commented May 30, 2019

I also found it runs correctly with a single gpu and net.train() under torch.no_grad().

@lingzenan
Copy link

@shimazing @jarrelscy did you train the classification model?The author release the code in the latest version paper but the link is 404 now.

@lingzenan
Copy link

my classification net doesn’t work on single gpu the loss explodes

@jarrelscy
Copy link
Owner

jarrelscy commented May 30, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants