loss=nan #22

machine52vision · 2021-08-27T02:36:32Z

hello,how solve loss=nan?

XiaoPengZong · 2021-09-07T01:42:53Z

Hi, @machine52vision , have you solved this problem?

SDJustus · 2021-09-07T06:37:52Z

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed.
That said, it doesn't matter if loss is NaN.

machine52vision · 2021-09-07T06:43:54Z

thanks a lot!

XiaoPengZong · 2021-09-07T06:45:28Z

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed.
That said, it doesn't matter if loss is NaN.

Hi. @SDJustus , I want to train my dataset with this code, not just inference.
So I think it is matter if loss is Nan.

SDJustus · 2021-09-07T07:20:12Z

OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing).
So again, no network weight updates are done during training. So Loss NaN is totally fine here.

XiaoPengZong · 2021-09-07T07:53:55Z

OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing).
So again, no network weight updates are done during training. So Loss NaN is totally fine here.

OK, thanks, I know it.

zhangjunli177 · 2022-04-20T00:02:26Z

Dig into the pl code pytorch_lightning\core\lightning.py, when prepare the dump info in each batch, there's such logic to assign the value to loss in function get_progress_bar_dict.
if running_train_loss is not None: avg_training_loss = running_train_loss.cpu().item() elif self.automatic_optimization: avg_training_loss = float('NaN')
check the definition automatic_optimization,
def automatic_optimization(self) -> bool: """ If False you are responsible for calling .backward, .step, zero_grad. """ return self._automatic_optimization
As there's no backward logic during trainning, automatic_optimization can be set to false to avoid set NaN to loss.
I've modified the function configure_optimizers in train.py, there's no loss=NaN printed anymore.
def configure_optimizers(self): self.automatic_optimization = False return None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss=nan #22

loss=nan #22

machine52vision commented Aug 27, 2021

XiaoPengZong commented Sep 7, 2021

SDJustus commented Sep 7, 2021

machine52vision commented Sep 7, 2021

XiaoPengZong commented Sep 7, 2021

SDJustus commented Sep 7, 2021 •

edited

Loading

XiaoPengZong commented Sep 7, 2021

zhangjunli177 commented Apr 20, 2022

loss=nan #22

loss=nan #22

Comments

machine52vision commented Aug 27, 2021

XiaoPengZong commented Sep 7, 2021

SDJustus commented Sep 7, 2021

machine52vision commented Sep 7, 2021

XiaoPengZong commented Sep 7, 2021

SDJustus commented Sep 7, 2021 • edited Loading

XiaoPengZong commented Sep 7, 2021

zhangjunli177 commented Apr 20, 2022

SDJustus commented Sep 7, 2021 •

edited

Loading