Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images layer: A data provider layer directly from images #120

Merged
merged 7 commits into from
Mar 13, 2014

Conversation

sguada
Copy link
Contributor

@sguada sguada commented Feb 17, 2014

This data layer, reads a file with "path/image_filenames label" and provides a data and label tops in the same way data_layer does. But it doesn't require the images to be in a leveldb, or even doesn't the require the images to be resized in advance (although it is recommended for speed).

Still some speed test comparisons has to be done.

@sguada sguada mentioned this pull request Feb 17, 2014
@Yangqing
Copy link
Member

It is my random thought, but would it be good to merge data_layer and input_layer by allowing an input format selection? Since there are some codes that could be reused, such as threaded prefetching.

@jeffdonahue
Copy link
Contributor

Agreed with Yangqing - this seems it might only be a 10 or 20 line change to DataLayer with an input format selector param; this is a lot of code to duplicate imo.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 18, 2014

Learning from some of the recent contributing efforts including mine that finally involved significant code refactoring or even reversion after the initial pull requests, I feel strongly that contributors should create issues for the wanted features or bug fixes at first. It is only after exchanging the thoughts about the most suitable designs or algorithms with the project owners and other contributors should the contributor begin investing a lot time time in really developing. This will avoid too much wasted sunk costs along the way.

@sguada
Copy link
Contributor Author

sguada commented Feb 18, 2014

@Yangqing @jeffdonahue I like your idea. Initially I wanted to get the layer working, and reusing as much code as possible from Data_layer seemed right.

@kloudkl I think refactoring the code after it is working is a good idea. So I don't feel it was a waste of time, it let me understand better the differences and similarities between data_layer and images_layer, and now I feel can probably extract the common parts and separate the differences.
However getting the opinions from other contributors and project owners is always useful.

@Yangqing
Copy link
Member

Welcome to the realm of research code, where 90% of the codes are sunken.
For example, maybe some of us still remember this:

https://github.com/Yangqing/iceberk/

which is now at the bottom of the ocean filled with coffee.

Yangqing

On Mon, Feb 17, 2014 at 6:55 PM, Sergio Guadarrama <notifications@github.com

wrote:

@Yangqing https://github.com/Yangqing @jeffdonahuehttps://github.com/jeffdonahueI like your idea. Initially I wanted to get the layer working, and reusing
as much code as possible from Data_layer seemed right.

@kloudkl https://github.com/kloudkl I think refactoring the code after
it is working is a good idea. So I don't feel it was a waste of time, it
let me understand better the differences and similarities between
data_layer and images_layer, and now I feel can probably extract the common
parts and separate the differences.
However getting the opinions from other contributors and project owners is
always useful.

Reply to this email directly or view it on GitHubhttps://github.com//pull/120#issuecomment-35348050
.

@shelhamer
Copy link
Member

I'm currently drafting development and contributing guides; please join the discussion at #101.

@kloudkl: it is certainly important for discussion and avoiding duplication of effort that people make their suggestions known and claim their contributions. However, to avoid double issues (issue + PR) and fragmenting conversations, I propose a natural way to do this with PRs when possible.

@sguada: discussion is definitely helpful, and I think issues + PRs are the place to do it in public.

@Yangqing reminds us of the truth, as ever.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 18, 2014

I skimmed through the iceberk project and figured out that DeCAF, the evolution origin of Caffe, borrowed some key data structures and algorithms from it. In this sense, it is the testbed of this now very mature deep network power engine and not sunken but reborn.

@sguada
Copy link
Contributor Author

sguada commented Feb 18, 2014

I have done some performance tests with Titan card, and it seems that images_layer is approx twice as slow as the data_layer, what translate to a 8% slower in the forward-backward pass. What is not to bad considering that each image has to be read from a different jpg file.

# Using images_layer batchsize=50 100 repetitions

E0218 13:39:18.331570 31627 net_speed_benchmark.cpp:66] *** Benchmark begins ***
E0218 13:39:27.378861 31627 net_speed_benchmark.cpp:74] data    forward: 9.04 seconds.
E0218 13:39:30.105782 31627 net_speed_benchmark.cpp:74] conv1   forward: 2.8 seconds.
...
E0218 13:39:46.968508 31627 net_speed_benchmark.cpp:77] Forward pass: 28.68 seconds.
E0218 13:40:21.253057 31627 net_speed_benchmark.cpp:88] Backward pass: 34.27 seconds.
E0218 13:40:21.253075 31627 net_speed_benchmark.cpp:89] Total Time: 62.95 seconds.
E0218 13:40:21.253084 31627 net_speed_benchmark.cpp:90] *** Benchmark ends ***
# Using data_layer batchsize=50 100 repetitions

E0218 13:41:56.200095 31873 net_speed_benchmark.cpp:66] *** Benchmark begins ***
E0218 13:42:01.953526 31873 net_speed_benchmark.cpp:74] data    forward: 4.22 seconds.
E0218 13:42:04.677783 31873 net_speed_benchmark.cpp:74] conv1   forward: 2.75 seconds.
...
E0218 13:42:21.526362 31873 net_speed_benchmark.cpp:77] Forward pass: 23.81 seconds.
E0218 13:42:55.801816 31873 net_speed_benchmark.cpp:88] Backward pass: 34.25 seconds.
E0218 13:42:55.801826 31873 net_speed_benchmark.cpp:89] Total Time: 58.06 seconds.
E0218 13:42:56.253084 31873 net_speed_benchmark.cpp:90] *** Benchmark ends ***

@sergeyk
Copy link
Contributor

sergeyk commented Feb 24, 2014

@sguada will work further on this w.r.t. #148, probably won't be done until after March 7.

shelhamer added a commit that referenced this pull request Mar 13, 2014
Images layer: A data provider layer directly from images
@shelhamer shelhamer merged commit b1765ce into BVLC:master Mar 13, 2014
@shelhamer
Copy link
Member

@sguada thanks. Let's not forget to refactor the data layers at some point though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants