You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use caffe to train a multitask net which do both classification and regression.
My loss layers are:
layer {
name: "loss1"
type: "SoftmaxWithLoss"
bottom: "fc8_1"
bottom: "label1" //size100 for classfication
top: "loss1"
loss_weight: 2.0
}
layer {
name: "loss2"
type: "EuclideanLoss"
bottom: "fc8_2"
bottom: "label2" //size4 for regression
top: "loss2"
}
I finetune it with Caffenet.caffemodel, and the solver.prototxt is:
net: "/_/train_vol.prototxt"
test_iter: 100
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 20000
display: 20
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "_******"
solver_mode: GPU
The error is:
Iteration 20, loss = nan
Train net output #0: loss1 = 2.58978 (* 4 = 10.3591 loss)
Train net output #1: loss2 = nan (* 1 = nan loss)
Only iteration 0 is without nan:
Iteration 0, loss = 14.9363
Train net output #0: loss1 = 3.10484 (* 4 = 12.4193 loss)
Train net output #1: loss2 = 2.51693 (* 1 = 2.51693 loss)
The other iterations are always with nan loss.
I follow #409 it is not helpful for me.
When I set base_lr to 0, nan is gone, but even base_lr is small as 0.0001 there is nan again.
Any advice will be appreciated!
Thanks!
artiit.
The text was updated successfully, but these errors were encountered:
artiit
changed the title
multitask CNN training error loss = nan
Training error loss = NAN in multitask net
Nov 30, 2015
@artiit Hi, I meet the same problem as you, multitask CNN training error loss = nan. How did you solve the problem? can you give me some advice? Thanks a lot.
I met the same problem, but I solved it by reducing the learning rate from 1e-003 to 1e-007 or below. At the time more iterations were required (>200000) for the convergence of the model.
Hi,@xyxxyx. Just like @rkakamilan did, the best way is to reduce your learning rate and have a little patience. If it still doesn't work or the learning result is too bad, you should considerate whether your model is correct, and start with a simple net.
Hi everyone,
I use caffe to train a multitask net which do both classification and regression.
My loss layers are:
layer {
name: "loss1"
type: "SoftmaxWithLoss"
bottom: "fc8_1"
bottom: "label1" //size100 for classfication
top: "loss1"
loss_weight: 2.0
}
layer {
name: "loss2"
type: "EuclideanLoss"
bottom: "fc8_2"
bottom: "label2" //size4 for regression
top: "loss2"
}
I finetune it with Caffenet.caffemodel, and the solver.prototxt is:
net: "/_/train_vol.prototxt"
test_iter: 100
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 20000
display: 20
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "_******"
solver_mode: GPU
The error is:
Iteration 20, loss = nan
Train net output #0: loss1 = 2.58978 (* 4 = 10.3591 loss)
Train net output #1: loss2 = nan (* 1 = nan loss)
Only iteration 0 is without nan:
Iteration 0, loss = 14.9363
Train net output #0: loss1 = 3.10484 (* 4 = 12.4193 loss)
Train net output #1: loss2 = 2.51693 (* 1 = 2.51693 loss)
The other iterations are always with nan loss.
I follow #409 it is not helpful for me.
When I set base_lr to 0, nan is gone, but even base_lr is small as 0.0001 there is nan again.
Any advice will be appreciated!
Thanks!
artiit.
The text was updated successfully, but these errors were encountered: