Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the conv5 features with pre-trained ImageNet model for images with different width and height? #1056

Closed
zyzhong opened this issue Sep 9, 2014 · 4 comments
Labels

Comments

@zyzhong
Copy link

zyzhong commented Sep 9, 2014

I want to calculate the pooled conv5 features for images 480x320. But the prototxt file seems always have the same width and height. I'm not sure in prototxt file which is the width.
Is the following prototxt file right?
I use the model trained in ImageNet2012 so the image mean is 256x256, how to get a reasonable mean?
And one more question, if I use this prototxt I get 14x9x256 features, does the pooled conv5 features map preserves spatial layout of the images?

input: "data"
input_dim: 1 //batch size
input_dim: 3 //channel
input_dim: 320 //height?
input_dim:480 //width?
layers{
...

@sguada
Copy link
Contributor

sguada commented Sep 9, 2014

That's a valid prototxt, for speed if you have enough memory you may want
to process more that one image at the time, by increasing batch size.

Yes caffe preserves spatial layout, but remember that you need to switch
dimensions W and H of the features due to row vs column order of Matlab.

Sergio

2014-09-09 7:12 GMT-07:00 zyzhong notifications@github.com:

I want to calculate the pooled conv5 features for images 480x320. But the
prototxt file seems always have the same width and height. I'm not sure in
prototxt file which is the width.
Is the following prototxt file right?
And one more question, if I use this prototxt I get 14x9x256 features,
does the pooled conv5 features map preserves spatial layout of the images?

input: "data"
input_dim: 1 //batch size
input_dim: 3 //channel
input_dim: 320 //height?
input_dim:480 //width?
layers{
...


Reply to this email directly or view it on GitHub
#1056.

@zyzhong
Copy link
Author

zyzhong commented Sep 10, 2014

Thanks a lot.
The mex function caffe() in matlab returns a 14x9x256 features map, do I need to switch
dimensions W and H? How to do it?
I use the model trained in ImageNet2012 so the image mean is 256x256, how to get a reasonable mean?

@shelhamer
Copy link
Member

The input blob dimensions are batch, channel, height, width so your prototxt is right.

Note for non-fixed input sizes you need to make the model fully-convolutional: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

For the mean the simplest approach is to reduce it to channels only i.e. average over height and width. That essentially works just as well and is simple.

Please ask usage questions on caffe-users. Issues are for development discussion. Thanks!

@shelhamer
Copy link
Member

Fully-convolutional models plus @longjon's #594 in the latest release completely address feature map extraction for any input size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants