-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On custom data training diverges (loss = NaN) #409
Comments
Try reducing the base learning rate On Monday, May 12, 2014, smiley19 notifications@github.com wrote:
Sergio |
I try learning rate from 0.0001~0.001 , and there may be two possible Is there other way to solve it? |
Try different initializations, for instance bias set to 0.1 On Monday, May 12, 2014, smiley19 notifications@github.com wrote:
Sergio |
For a sanity check, try running with a learning rate 0 to see if any nan Yangqing On Mon, May 12, 2014 at 10:03 PM, Sergio Guadarrama <
|
I try to set learning rate 0 , and training is like, It seems that it doesn't change. Is that means my data initial is okay? I also try to set learning rate 0.00001 and weight bias = 0.1 ( it may turn out to nan ) |
Sorry, we cannot train and tune your model for you. Consult references on deep learning and tutorials such as Marc'aurelio Ranzato's CVPR '12 tutorial slides on tips and tricks. |
I met the same problem as you , could you tell me how you to save it,thanks |
I try to train my own dataset ( 4 hand gestures )
I didn't change the overall structure of the example program ( ex mnist & imagenet ), the only thing I modify is the input data set.
No matter how I adjust the possible parameters( ex learning rates, weight decay) the results of both weighting and loss function turn out to diverge to NaN.
Or the loss function didn't decrease.
I use "convert_imageset.bin" to transform the input image to leveldb input.
Creating leveldb...
E0513 10:04:46.223364 19650 convert_imageset.cpp:96] Processed 1000 files.
E0513 10:04:53.909580 19650 convert_imageset.cpp:96] Processed 2000 files.
E0513 10:04:59.556373 19650 convert_imageset.cpp:96] Processed 3000 files.
E0513 10:05:05.393556 19650 convert_imageset.cpp:96] Processed 4000 files.
E0513 10:05:11.244086 19650 convert_imageset.cpp:96] Processed 5000 files.
E0513 10:05:16.990255 19650 convert_imageset.cpp:96] Processed 6000 files.
E0513 10:05:25.553741 19650 convert_imageset.cpp:96] Processed 7000 files.
E0513 10:05:31.347475 19650 convert_imageset.cpp:96] Processed 8000 files.
E0513 10:05:36.977419 19650 convert_imageset.cpp:96] Processed 9000 files.
E0513 10:05:43.507733 19650 convert_imageset.cpp:96] Processed 10000 files.
E0513 10:05:51.023560 19650 convert_imageset.cpp:96] Processed 11000 files.
E0513 10:05:56.628383 19650 convert_imageset.cpp:96] Processed 12000 files.
E0513 10:06:02.121335 19650 convert_imageset.cpp:104] Processed 12800 files.
E0513 10:06:06.624284 19994 convert_imageset.cpp:96] Processed 1000 files.
E0513 10:06:08.399435 19994 convert_imageset.cpp:104] Processed 1871 files.
Done.
And the training is like,
I0513 11:15:46.741041 30505 train_net.cpp:26] Starting Optimization
I0513 11:15:46.741169 30505 solver.cpp:41] Creating training net.
I0513 11:15:46.741538 30505 net.cpp:75] Creating Layer hand
I0513 11:15:46.741564 30505 net.cpp:111] hand -> data
I0513 11:15:46.741597 30505 net.cpp:111] hand -> label
I0513 11:15:46.741647 30505 data_layer.cpp:145] Opening leveldb hand-train-leveldb
I0513 11:15:46.876204 30505 data_layer.cpp:185] output data size: 128,3,50,50
I0513 11:15:47.147683 30505 net.cpp:126] Top shape: 128 3 50 50 (960000)
I0513 11:15:47.147743 30505 net.cpp:126] Top shape: 128 1 1 1 (128)
I0513 11:15:47.147759 30505 net.cpp:157] hand does not need backward computation.
I0513 11:15:47.147783 30505 net.cpp:75] Creating Layer conv1
I0513 11:15:47.147797 30505 net.cpp:85] conv1 <- data
I0513 11:15:47.147817 30505 net.cpp:111] conv1 -> conv1
I0513 11:15:47.147897 30505 net.cpp:126] Top shape: 128 20 45 45 (5184000)
I0513 11:15:47.147917 30505 net.cpp:152] conv1 needs backward computation.
I0513 11:15:47.147933 30505 net.cpp:75] Creating Layer pool1
I0513 11:15:47.147945 30505 net.cpp:85] pool1 <- conv1
I0513 11:15:47.147958 30505 net.cpp:111] pool1 -> pool1
I0513 11:15:47.147977 30505 net.cpp:126] Top shape: 128 20 15 15 (576000)
I0513 11:15:47.147997 30505 net.cpp:152] pool1 needs backward computation.
I0513 11:15:47.148012 30505 net.cpp:75] Creating Layer conv2
I0513 11:15:47.148025 30505 net.cpp:85] conv2 <- pool1
I0513 11:15:47.148036 30505 net.cpp:111] conv2 -> conv2
I0513 11:15:47.148380 30505 net.cpp:126] Top shape: 128 50 10 10 (640000)
I0513 11:15:47.148401 30505 net.cpp:152] conv2 needs backward computation.
I0513 11:15:47.148416 30505 net.cpp:75] Creating Layer pool2
I0513 11:15:47.148429 30505 net.cpp:85] pool2 <- conv2
I0513 11:15:47.148442 30505 net.cpp:111] pool2 -> pool2
I0513 11:15:47.148458 30505 net.cpp:126] Top shape: 128 50 5 5 (160000)
I0513 11:15:47.148470 30505 net.cpp:152] pool2 needs backward computation.
I0513 11:15:47.148485 30505 net.cpp:75] Creating Layer ip1
I0513 11:15:47.148497 30505 net.cpp:85] ip1 <- pool2
I0513 11:15:47.148510 30505 net.cpp:111] ip1 -> ip1
I0513 11:15:47.154276 30505 net.cpp:126] Top shape: 128 500 1 1 (64000)
I0513 11:15:47.154330 30505 net.cpp:152] ip1 needs backward computation.
I0513 11:15:47.154347 30505 net.cpp:75] Creating Layer relu1
I0513 11:15:47.154361 30505 net.cpp:85] relu1 <- ip1
I0513 11:15:47.154376 30505 net.cpp:99] relu1 -> ip1 (in-place)
I0513 11:15:47.154392 30505 net.cpp:126] Top shape: 128 500 1 1 (64000)
I0513 11:15:47.154404 30505 net.cpp:152] relu1 needs backward computation.
I0513 11:15:47.154420 30505 net.cpp:75] Creating Layer ip2
I0513 11:15:47.154431 30505 net.cpp:85] ip2 <- ip1
I0513 11:15:47.154443 30505 net.cpp:111] ip2 -> ip2
I0513 11:15:47.154484 30505 net.cpp:126] Top shape: 128 4 1 1 (512)
I0513 11:15:47.154500 30505 net.cpp:152] ip2 needs backward computation.
I0513 11:15:47.154520 30505 net.cpp:75] Creating Layer loss
I0513 11:15:47.154532 30505 net.cpp:85] loss <- ip2
I0513 11:15:47.154546 30505 net.cpp:85] loss <- label
I0513 11:15:47.154562 30505 net.cpp:152] loss needs backward computation.
I0513 11:15:47.154587 30505 net.cpp:180] Collecting Learning Rate and Weight Decay.
I0513 11:15:47.154608 30505 net.cpp:173] Network initialization done.
I0513 11:15:47.154623 30505 net.cpp:174] Memory required for Data 30338560
I0513 11:15:47.154680 30505 solver.cpp:44] Creating testing net.
I0513 11:15:47.155036 30505 net.cpp:75] Creating Layer hand
I0513 11:15:47.155061 30505 net.cpp:111] hand -> data
I0513 11:15:47.155079 30505 net.cpp:111] hand -> label
I0513 11:15:47.155096 30505 data_layer.cpp:145] Opening leveldb hand-test-leveldb
I0513 11:15:47.268432 30505 data_layer.cpp:185] output data size: 1871,3,50,50
I0513 11:15:47.285804 30505 net.cpp:126] Top shape: 1871 3 50 50 (14032500)
I0513 11:15:47.285868 30505 net.cpp:126] Top shape: 1871 1 1 1 (1871)
I0513 11:15:47.285884 30505 net.cpp:157] hand does not need backward computation.
I0513 11:15:47.285908 30505 net.cpp:75] Creating Layer conv1
I0513 11:15:47.285922 30505 net.cpp:85] conv1 <- data
I0513 11:15:47.285936 30505 net.cpp:111] conv1 -> conv1
I0513 11:15:47.286005 30505 net.cpp:126] Top shape: 1871 20 45 45 (75775500)
I0513 11:15:47.286023 30505 net.cpp:152] conv1 needs backward computation.
I0513 11:15:47.286039 30505 net.cpp:75] Creating Layer pool1
I0513 11:15:47.286052 30505 net.cpp:85] pool1 <- conv1
I0513 11:15:47.286066 30505 net.cpp:111] pool1 -> pool1
I0513 11:15:47.286079 30505 net.cpp:126] Top shape: 1871 20 15 15 (8419500)
I0513 11:15:47.286092 30505 net.cpp:152] pool1 needs backward computation.
I0513 11:15:47.286108 30505 net.cpp:75] Creating Layer conv2
I0513 11:15:47.286119 30505 net.cpp:85] conv2 <- pool1
I0513 11:15:47.286133 30505 net.cpp:111] conv2 -> conv2
I0513 11:15:47.286484 30505 net.cpp:126] Top shape: 1871 50 10 10 (9355000)
I0513 11:15:47.286504 30505 net.cpp:152] conv2 needs backward computation.
I0513 11:15:47.286522 30505 net.cpp:75] Creating Layer pool2
I0513 11:15:47.286535 30505 net.cpp:85] pool2 <- conv2
I0513 11:15:47.286548 30505 net.cpp:111] pool2 -> pool2
I0513 11:15:47.286561 30505 net.cpp:126] Top shape: 1871 50 5 5 (2338750)
I0513 11:15:47.286574 30505 net.cpp:152] pool2 needs backward computation.
I0513 11:15:47.286591 30505 net.cpp:75] Creating Layer ip1
I0513 11:15:47.286602 30505 net.cpp:85] ip1 <- pool2
I0513 11:15:47.286615 30505 net.cpp:111] ip1 -> ip1
I0513 11:15:47.292402 30505 net.cpp:126] Top shape: 1871 500 1 1 (935500)
I0513 11:15:47.292474 30505 net.cpp:152] ip1 needs backward computation.
I0513 11:15:47.292493 30505 net.cpp:75] Creating Layer relu1
I0513 11:15:47.292506 30505 net.cpp:85] relu1 <- ip1
I0513 11:15:47.292522 30505 net.cpp:99] relu1 -> ip1 (in-place)
I0513 11:15:47.292536 30505 net.cpp:126] Top shape: 1871 500 1 1 (935500)
I0513 11:15:47.292548 30505 net.cpp:152] relu1 needs backward computation.
I0513 11:15:47.292564 30505 net.cpp:75] Creating Layer ip2
I0513 11:15:47.292577 30505 net.cpp:85] ip2 <- ip1
I0513 11:15:47.292588 30505 net.cpp:111] ip2 -> ip2
I0513 11:15:47.292644 30505 net.cpp:126] Top shape: 1871 4 1 1 (7484)
I0513 11:15:47.292659 30505 net.cpp:152] ip2 needs backward computation.
I0513 11:15:47.292680 30505 net.cpp:75] Creating Layer prob
I0513 11:15:47.292692 30505 net.cpp:85] prob <- ip2
I0513 11:15:47.292706 30505 net.cpp:111] prob -> prob
I0513 11:15:47.292722 30505 net.cpp:126] Top shape: 1871 4 1 1 (7484)
I0513 11:15:47.292736 30505 net.cpp:152] prob needs backward computation.
I0513 11:15:47.292748 30505 net.cpp:75] Creating Layer accuracy
I0513 11:15:47.292764 30505 net.cpp:85] accuracy <- prob
I0513 11:15:47.292776 30505 net.cpp:85] accuracy <- label
I0513 11:15:47.292790 30505 net.cpp:111] accuracy -> accuracy
I0513 11:15:47.292806 30505 net.cpp:126] Top shape: 1 2 1 1 (2)
I0513 11:15:47.292819 30505 net.cpp:152] accuracy needs backward computation.
I0513 11:15:47.292831 30505 net.cpp:163] This network produces output accuracy
I0513 11:15:47.292848 30505 net.cpp:180] Collecting Learning Rate and Weight Decay.
I0513 11:15:47.292865 30505 net.cpp:173] Network initialization done.
I0513 11:15:47.292877 30505 net.cpp:174] Memory required for Data 443494364
I0513 11:15:47.292930 30505 solver.cpp:49] Solver scaffolding done.
I0513 11:15:47.292948 30505 solver.cpp:60] Solving Hand
I0513 11:15:47.292968 30505 solver.cpp:105] Iteration 0, Testing net
I0513 11:15:47.720150 30505 solver.cpp:141] Test score #0: 0.207376
I0513 11:15:47.720239 30505 solver.cpp:141] Test score #1: 1.38612
I0513 11:15:56.640269 30505 solver.cpp:236] Iteration 100, lr = 0.00992565
I0513 11:15:56.640511 30505 solver.cpp:86] Iteration 100, loss = 2.70168
I0513 11:16:05.554852 30505 solver.cpp:236] Iteration 200, lr = 0.00985258
I0513 11:16:05.555102 30505 solver.cpp:86] Iteration 200, loss = nan
I0513 11:16:14.469753 30505 solver.cpp:236] Iteration 300, lr = 0.00978075
I0513 11:16:14.470007 30505 solver.cpp:86] Iteration 300, loss = nan
I0513 11:16:23.383903 30505 solver.cpp:236] Iteration 400, lr = 0.00971013
I0513 11:16:23.384151 30505 solver.cpp:86] Iteration 400, loss = nan
I0513 11:16:23.384174 30505 solver.cpp:105] Iteration 400, Testing net
I0513 11:16:23.702975 30505 solver.cpp:141] Test score #0: 0
I0513 11:16:23.703033 30505 solver.cpp:141] Test score #1: nan
Also,I want to check the input leveldb, but I have no idea to do that.
The text was updated successfully, but these errors were encountered: