Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning stops early with reduced batch size #1557

Closed
guzh870423 opened this issue Dec 11, 2014 · 5 comments
Closed

Learning stops early with reduced batch size #1557

guzh870423 opened this issue Dec 11, 2014 · 5 comments

Comments

@guzh870423
Copy link

Hello,

I am using Tesla c2050 which is of compute capability 2.0. It reports an error if I train imagenet with default setting batch_size = 256, like #629

So I reduced batch_size to 64, and correspondingly changed base_lr from 0.01 to 0.01414, stepsize from 100000 to 400000, max_iter from 450000 to 1800000. I also changed bias from 1 to 0.1 for some layers in file models/bvlc_reference_caffenet/train_val.prototxt, as suggested #430, otherwise it does not learn anything.

image

But my result is not as good as #430. The loss is not decreasing but oscillating after 20,000 iterations. I tried alexnet model with the same parameter change except base_lr = 0.02. The result was similar, if not worse.

Any idea what may cause this? Thanks.

@danielorf
Copy link

This page may be of help to you: #430
Also, how did you gather training-loss/iteration data shown in your plot?

@guzh870423
Copy link
Author

@danielorf
Thank you for answering my post. Actually #430 is what I am questioning of. My result is different from that.
Now I am using another GPU k20. I think it is doing fine now.

Honestly I had the same question. I was wondering how they did those plots.
For me, I just wrote a script to extract iteration vs loss from output file. I suspect they have a build-in tool to do the similar thing. Maybe you can find the answer and tell me.

@sguada
Copy link
Contributor

sguada commented Dec 21, 2014

The tool is in tools/extra/parse_log.sh

On Saturday, December 20, 2014, Zhenghao Gu notifications@github.com
wrote:

@danielorf https://github.com/danielorf
Thank you for answering my post. Actually #430
#430 is what I am questioning of.
My result is different from that.
Now I am using another GPU k20. I think it is doing fine now.

Honestly I had the same question. I was wondering how they did those plots.
For me, I just wrote a script to extract iteration vs loss from output
file. I suspect they have a build-in tool to do the similar thing. Maybe
you can find the answer and tell me.


Reply to this email directly or view it on GitHub
#1557 (comment).

Sergio

@danielorf
Copy link

Sorry, I clearly didn't read your question properly, my mistake.

@danielorf
Copy link

Whre might one find this "caffe.log" file?

Edit: Found it - It's located in /tmp/ with the name "caffe.[pc name].[username].log.INFO.date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants