Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss=nan #22

Open
machine52vision opened this issue Aug 27, 2021 · 7 comments
Open

loss=nan #22

machine52vision opened this issue Aug 27, 2021 · 7 comments

Comments

@machine52vision
Copy link

hello,how solve loss=nan?

@XiaoPengZong
Copy link

Hi, @machine52vision , have you solved this problem?

@SDJustus
Copy link

SDJustus commented Sep 7, 2021

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed.
That said, it doesn't matter if loss is NaN.

@machine52vision
Copy link
Author

thanks a lot!

@XiaoPengZong
Copy link

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed.
That said, it doesn't matter if loss is NaN.

Hi. @SDJustus , I want to train my dataset with this code, not just inference.
So I think it is matter if loss is Nan.

@SDJustus
Copy link

SDJustus commented Sep 7, 2021

OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing).
So again, no network weight updates are done during training. So Loss NaN is totally fine here.

@XiaoPengZong
Copy link

OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing).
So again, no network weight updates are done during training. So Loss NaN is totally fine here.

OK, thanks, I know it.

@zhangjunli177
Copy link

Dig into the pl code pytorch_lightning\core\lightning.py, when prepare the dump info in each batch, there's such logic to assign the value to loss in function get_progress_bar_dict.
if running_train_loss is not None: avg_training_loss = running_train_loss.cpu().item() elif self.automatic_optimization: avg_training_loss = float('NaN')
check the definition automatic_optimization,
def automatic_optimization(self) -> bool: """ If False you are responsible for calling .backward, .step, zero_grad. """ return self._automatic_optimization
As there's no backward logic during trainning, automatic_optimization can be set to false to avoid set NaN to loss.
I've modified the function configure_optimizers in train.py, there's no loss=NaN printed anymore.
def configure_optimizers(self): self.automatic_optimization = False return None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants