the loss is nan #8

Maycbj · 2019-06-28T17:30:45Z

It is a very nice work. But there are some problem in my experiments.
Training is easy to gradient explosion, the loss is nan, even if my learning rate is set 0. Could you give me some advice.

joe-siyuan-qiao · 2019-06-30T01:18:42Z

Thanks for your interest. It's hard to diagnose the problem based on the provided information. It seems that the problem may be due to numerical issues. Please make sure all outputs are properly normalized. You can also provide more information so we can offer better help.

Maycbj · 2019-07-06T06:30:09Z

oh, I have found the reason. The loss is nan, owing to the initialization.

So I use your pre-trained model. But in your MaskRCNN-benchmark, the pre-trained model for Faster-RCNN and Mask-RCNN are different. Why the pre-trained models trained on ImageNet are different?

My guess is that the params in the Resnet are trained on the Imagenet. Then you transfer the random initialization params(FPN and RCNNHead) to the weight standard type param?

joe-siyuan-qiao · 2019-07-06T07:26:54Z

Good to know that you found the reason.

I'm not sure I understand your question. The models pre-trained on ImageNet for Faster-RCNN and Mask-RCNN are the same -- they all point to "catalog://WeightStandardization/R-50-GN-WS", for example. The pre-trained models only contain the parameters of the backbones. Other parts such as heads are not included in the pre-trained models.

zql-seu · 2019-07-29T06:59:38Z

Hello! I also encountered the problem of loss explosion. I used WS + GN to pre-train on Imagenet without any problems. But when I used the pre-trained model as backbone for semantic segmentation, when the loss dropped to a certain extent, loss become nan. I tried to freeze the backbone and add WS + GN to the decoding path. When the loss decreases, I also encounter nan. I feel that there are problems in some places.

chenxi116 · 2019-07-30T02:13:27Z

From my experience, nan is caused by either too large learning rate or inappropriate batch norm layer statistics. Based on your screenshot, it's unlikely the first, as the loss is actually decreasing.

I recommend writing some if np.isnan with pdb.set_trace() to diagnose the cause. For example, you can check the logits as input to the loss function.

MarcoForte · 2019-09-04T19:33:05Z

My reply in this issue might help, #1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the loss is nan #8

the loss is nan #8

Maycbj commented Jun 28, 2019

joe-siyuan-qiao commented Jun 30, 2019

Maycbj commented Jul 6, 2019 •

edited

Loading

joe-siyuan-qiao commented Jul 6, 2019

zql-seu commented Jul 29, 2019

chenxi116 commented Jul 30, 2019

MarcoForte commented Sep 4, 2019

the loss is nan #8

the loss is nan #8

Comments

Maycbj commented Jun 28, 2019

joe-siyuan-qiao commented Jun 30, 2019

Maycbj commented Jul 6, 2019 • edited Loading

joe-siyuan-qiao commented Jul 6, 2019

zql-seu commented Jul 29, 2019

chenxi116 commented Jul 30, 2019

MarcoForte commented Sep 4, 2019

Maycbj commented Jul 6, 2019 •

edited

Loading