Same setup - different training loss (outcome) #6588

keck91 · 2018-10-30T07:46:59Z

Dear community,

I currenty face the problem, that the performance of my training varies strongly.
Sometimes, I have a usable caffemodel after training and sometimes not.
So I started a test. I picked 100 labeled images and run 3 short trainings with it, without changing anything (same data set, same learning rate of 0.001).
Results:
First run, "Radar09" with 50,000 iterations resulted in a loss of 0.001080. After 1,000 iterations, the loss was under 1.0
Second run, "Radar10" with 2.500 iterations resulted in a loss of 8.561
Third run, "Radar11" with 2.500 iterations resulted in a loss of 0.5212
Fourth run "Radar12" with 2.500 iterations resulted in a loss of 8.909

Why is this so?
What could be the reason?
How could I avoid this?
For a noticeably larger dataset and more iterations (3,000 images and 100,000 iterations) this will be a problem.

Thank you very much! :)

System configuration

Operating system: Ubuntu 16.04
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
Python version (if using pycaffe): 2.7

keck91 · 2018-11-06T06:09:02Z

Hello csukuangfj,

Thank you for your answer. I checked my src/caffe/proto/caffe.proto.
optional int64 random_seed is already set to 20

keck91 · 2018-11-06T06:41:56Z

Ah. I see!
Thank you.
I will continue with random_seed: 20 from now on, but unfortunately I don't know, which seed it was for the "vital" trainings not stucking at 6.9

ABCDAS · 2020-08-19T06:08:12Z

[libprotobuf ERROR C:\Users\guillaume\work\caffe-builder\build_v140_x64\packages\protobuf\protobuf_download-prefix\src\protobuf_download\src\google\protobuf\message_lite.cc:248] Exceeded maximum protobuf size of 2GB.
F0814 15:05:56.169399 11676 io.cpp:78] Check failed: proto.SerializeToOstream(&output)
*** Check failure stack trace: ***

do you solve this question?
thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same setup - different training loss (outcome) #6588

Same setup - different training loss (outcome) #6588

keck91 commented Oct 30, 2018

keck91 commented Nov 6, 2018

keck91 commented Nov 6, 2018

ABCDAS commented Aug 19, 2020

Same setup - different training loss (outcome) #6588

Same setup - different training loss (outcome) #6588

Comments

keck91 commented Oct 30, 2018

System configuration

keck91 commented Nov 6, 2018

keck91 commented Nov 6, 2018

ABCDAS commented Aug 19, 2020