Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same setup - different training loss (outcome) #6588

Open
keck91 opened this issue Oct 30, 2018 · 3 comments
Open

Same setup - different training loss (outcome) #6588

keck91 opened this issue Oct 30, 2018 · 3 comments

Comments

@keck91
Copy link

keck91 commented Oct 30, 2018

Dear community,

I currenty face the problem, that the performance of my training varies strongly.
Sometimes, I have a usable caffemodel after training and sometimes not.
So I started a test. I picked 100 labeled images and run 3 short trainings with it, without changing anything (same data set, same learning rate of 0.001).
Results:
First run, "Radar09" with 50,000 iterations resulted in a loss of 0.001080. After 1,000 iterations, the loss was under 1.0
Second run, "Radar10" with 2.500 iterations resulted in a loss of 8.561
Third run, "Radar11" with 2.500 iterations resulted in a loss of 0.5212
Fourth run "Radar12" with 2.500 iterations resulted in a loss of 8.909

Why is this so?
What could be the reason?
How could I avoid this?
For a noticeably larger dataset and more iterations (3,000 images and 100,000 iterations) this will be a problem.

radar09_crop
radar10
radar11
radar12

Thank you very much! :)

System configuration

  • Operating system: Ubuntu 16.04
  • CUDA version (if applicable): 8.0
  • CUDNN version (if applicable): 5.1
  • Python version (if using pycaffe): 2.7
@keck91
Copy link
Author

keck91 commented Nov 6, 2018

Hello csukuangfj,

Thank you for your answer. I checked my src/caffe/proto/caffe.proto.
optional int64 random_seed is already set to 20

screenshot from 2018-11-06 15-08-00

@keck91
Copy link
Author

keck91 commented Nov 6, 2018

Ah. I see!
Thank you.
I will continue with random_seed: 20 from now on, but unfortunately I don't know, which seed it was for the "vital" trainings not stucking at 6.9

@ABCDAS
Copy link

ABCDAS commented Aug 19, 2020

[libprotobuf ERROR C:\Users\guillaume\work\caffe-builder\build_v140_x64\packages\protobuf\protobuf_download-prefix\src\protobuf_download\src\google\protobuf\message_lite.cc:248] Exceeded maximum protobuf size of 2GB.
F0814 15:05:56.169399 11676 io.cpp:78] Check failed: proto.SerializeToOstream(&output)
*** Check failure stack trace: ***

do you solve this question?
thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants