Problems with automatic_optimization=False #4295

catalys1 · 2020-10-21T22:02:06Z

🐛 Bug

When automatic_optimization = False and terminate_on_nan = True, an exception is raised when checking for nan values. This is due to None being passed in as the value for loss to self.detect_nan_tensors. It looks like the code on master has already changed from what I'm seeing in 1.0.3, so I don't know if this has somehow been fixed or not. The problem seems to be that the AttributeDict returned from train_step has loss=None.

Please reproduce using the BoringModel and post here

https://colab.research.google.com/drive/1qQmP6BwQk--rBXC7W45y0mn6QK39IPcc

Expected behavior

Don't crash when automatic_optimization = False and terminate_on_nan = True

The text was updated successfully, but these errors were encountered:

github-actions · 2020-10-21T22:02:49Z

Hi! thanks for your contribution!, great first issue!

catalys1 · 2020-10-21T22:55:33Z

I discovered this because the loss was showing up as nan in the progress bar, and I was trying to figure out why I was getting nan. I dug some more, and it looks like this is itself a bug. I've inspected the loss and network parameters over several steps, and there are no nans. So there seems to be a problem in the logging somewhere, that if you're using automatic_optimization=False you get nan being logged as the loss in the progress bar.

SeanNaren · 2020-11-01T23:46:02Z

Thanks @catalys1 you are correct, however recent changes should have resolved this issue since the nan check only runs if using automatic optimization:

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/training_loop.py#L779-L789

In #4204 we'll make it clearer that you should report values within the training step via the docs :)

GregorySenay · 2020-12-03T23:41:03Z

Hi @catalys1,

in def training_step you can maybe overpass the nan issue by updating the running_loss directly:

self.trainer.train_loop.running_loss.append(loss)

in my case, no more nan whenautomatic_optimization=False

Maddy12 · 2021-03-10T01:31:07Z

I am having the same issue, but I am also trying to add to the progress bar other logging scalars which are not showing at all.

As for printing the loss, @GregorySenay comment worked for me!

self.trainer.train_loop.running_loss.append(loss)

catalys1 added bug Something isn't working help wanted Open to be worked on labels Oct 21, 2020

catalys1 changed the title ~~Problem with automatic_optimization=False and terminate_on_nan=True~~ Problems with automatic_optimization=False Oct 21, 2020

edenlightning added logger Related to the Loggers duplicate This issue or pull request already exists labels Oct 22, 2020

edenlightning mentioned this issue Oct 22, 2020

loss=None and no logs when automatic_optimization=False #4204

Closed

edenlightning closed this as completed Oct 22, 2020

SeanNaren reopened this Nov 1, 2020

SeanNaren removed the duplicate This issue or pull request already exists label Nov 1, 2020

SeanNaren closed this as completed Nov 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with automatic_optimization=False #4295

Problems with automatic_optimization=False #4295

catalys1 commented Oct 21, 2020

github-actions bot commented Oct 21, 2020

catalys1 commented Oct 21, 2020

SeanNaren commented Nov 1, 2020

GregorySenay commented Dec 3, 2020

Maddy12 commented Mar 10, 2021

Problems with automatic_optimization=False #4295

Problems with automatic_optimization=False #4295

Comments

catalys1 commented Oct 21, 2020

🐛 Bug

Please reproduce using the BoringModel and post here

Expected behavior

github-actions bot commented Oct 21, 2020

catalys1 commented Oct 21, 2020

SeanNaren commented Nov 1, 2020

GregorySenay commented Dec 3, 2020

Maddy12 commented Mar 10, 2021