Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 #2895

Nanne · 2015-08-10T16:14:46Z

I was debugging a network with two loss layers and wanted to disable one of them (a SoftmaxWithLossLayer), as such I set the loss_weight to 0. However, this does not do what I expected at all. The clearest way to explain this is probably using an example on how to reproduce it.

To reproduce one can take the examples/mnist/lenet_train_test.prototxt and add a second loss layer, with weight 0:

layer {
 name: "bad_loss"
 type: "SoftmaxWithLoss"
 bottom: "ip2"
 bottom: "label"
 top: "bad_loss"
 loss_weight: 0
}

and then run this python script:

caffe_root = '/roaming/nanne/caffe/' # Update this path to the correct path 
import sys
sys.path.insert(0, caffe_root + 'python')
import os
os.chdir(caffe_root)
import caffe
import numpy as np

caffe.set_mode_gpu()
solver = caffe.SGDSolver(caffe_root + 'examples/mnist/lenet_solver.prototxt')

solver.step(1)

print solver.net.blobs['ip2_ip2_0_split_0'].diff.squeeze()[5:7, :]
print solver.net.blobs['ip2_ip2_0_split_1'].diff.squeeze()[5:7, :]

print solver.net.blobs['ip2'].diff.squeeze()[5:7, :]

The diff for the split belonging to the SoftmaxWithLoss with loss_weight 0 will contain 64 (batchsize) values equal to the loss (NOT the gradient) for that input, and all the other elements will be 0. The other split will correctly contain all the diff values (64*10) for the loss with weight 1.

However, these two splits still get combined, creating the diff for 'ip2' for which the first 64 values are not comparable to the last 576. Am I wrong in how I tried to use the loss_weight or is this a bug? (It doesn't seem to be specific to SoftmaxWithLoss, though its most clear for this layer).

The text was updated successfully, but these errors were encountered:

longjon · 2015-08-14T06:05:48Z

This looks like a bug to me... try setting force_backward: true in your prototxt or setting the loss weight to a small nonzero value and see if the behavior changes. Here's what I think is happening: SoftmaxWithLossLayer uses its diff memory for temporary storage in forward, since backward will simply overwrite it with correct values (in this case, zeros). However, Net prunes the backward computation of this branch since the loss weight is set to zero, so the correct diff values never get set.

The easy solution (as the cost of some memory) is to outlaw writing to diff in forward.

Nanne · 2015-08-18T09:15:42Z

Any non-zero loss weight seems to work fine. Additionally, HingeLoss also uses its diff in the forward pass. I'd be happy with that solution, as it seems several other layers already use a diff_ blob in their forward pass to store calculations for the backward pass.

seanbell · 2015-08-18T15:15:19Z

Another possible solution: if a backward step is skipped, and diff is allocated (this is important to check), then diff is set to 0.

cvondrick · 2015-10-30T17:15:47Z

Just got hit by this bug unfortunately. I think an intermediate situation would be to add a CHECK failure when the loss_weight is 0 for this layer. Otherwise, people will get incorrect results, and be optimizing a different objective than they write in a paper (!).

BlGene · 2015-11-26T10:50:08Z

This bug also occurs when using python loss layers. In my case the presence of the second loss layer, with loss_weight: 0, causes the first few diff values of the first loss layer, with loss_weight: 1, to be overwritten, resulting in failed training.

Cysu · 2016-03-22T01:55:02Z

Trapped by this bug, too. I've made a PR #3868 based on Sean's solution.

shelhamer added the JD label Aug 10, 2015

longjon added the bug label Aug 14, 2015

longjon changed the title ~~Unexpected behaviour from a SoftmaxWithLossLayer with loss_weight 0~~ Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 Aug 14, 2015

seanbell mentioned this issue Nov 11, 2015

Better normalization options for SoftmaxWithLoss layer #3296

Merged

Cysu mentioned this issue Mar 22, 2016

Fix the bug of incorrect gradient when loss_weight set to zero #3868

Closed

RSly mentioned this issue Feb 22, 2017

understand multiple loss debug info from caffe_output.log NVIDIA/DIGITS#1471

Closed

shelhamer mentioned this issue Jan 29, 2018

Clear Scratch Diffs to Prevent Contaminating Backward through Splits #6202

Merged

shelhamer closed this as completed in #6202 Jan 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 #2895

Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 #2895

Nanne commented Aug 10, 2015

longjon commented Aug 14, 2015

Nanne commented Aug 18, 2015

seanbell commented Aug 18, 2015

cvondrick commented Oct 30, 2015

BlGene commented Nov 26, 2015

Cysu commented Mar 22, 2016

Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 #2895

Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 #2895

Comments

Nanne commented Aug 10, 2015

longjon commented Aug 14, 2015

Nanne commented Aug 18, 2015

seanbell commented Aug 18, 2015

cvondrick commented Oct 30, 2015

BlGene commented Nov 26, 2015

Cysu commented Mar 22, 2016