-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images layer: A data provider layer directly from images #120
Conversation
It is my random thought, but would it be good to merge data_layer and input_layer by allowing an input format selection? Since there are some codes that could be reused, such as threaded prefetching. |
Agreed with Yangqing - this seems it might only be a 10 or 20 line change to DataLayer with an input format selector param; this is a lot of code to duplicate imo. |
Learning from some of the recent contributing efforts including mine that finally involved significant code refactoring or even reversion after the initial pull requests, I feel strongly that contributors should create issues for the wanted features or bug fixes at first. It is only after exchanging the thoughts about the most suitable designs or algorithms with the project owners and other contributors should the contributor begin investing a lot time time in really developing. This will avoid too much wasted sunk costs along the way. |
@Yangqing @jeffdonahue I like your idea. Initially I wanted to get the layer working, and reusing as much code as possible from Data_layer seemed right. @kloudkl I think refactoring the code after it is working is a good idea. So I don't feel it was a waste of time, it let me understand better the differences and similarities between data_layer and images_layer, and now I feel can probably extract the common parts and separate the differences. |
Welcome to the realm of research code, where 90% of the codes are sunken. https://github.com/Yangqing/iceberk/ which is now at the bottom of the ocean filled with coffee. Yangqing On Mon, Feb 17, 2014 at 6:55 PM, Sergio Guadarrama <notifications@github.com
|
I'm currently drafting development and contributing guides; please join the discussion at #101. @kloudkl: it is certainly important for discussion and avoiding duplication of effort that people make their suggestions known and claim their contributions. However, to avoid double issues (issue + PR) and fragmenting conversations, I propose a natural way to do this with PRs when possible. @sguada: discussion is definitely helpful, and I think issues + PRs are the place to do it in public. @Yangqing reminds us of the truth, as ever. |
I skimmed through the iceberk project and figured out that DeCAF, the evolution origin of Caffe, borrowed some key data structures and algorithms from it. In this sense, it is the testbed of this now very mature deep network power engine and not sunken but reborn. |
I have done some performance tests with Titan card, and it seems that images_layer is approx twice as slow as the data_layer, what translate to a 8% slower in the forward-backward pass. What is not to bad considering that each image has to be read from a different jpg file.
|
Images layer: A data provider layer directly from images
@sguada thanks. Let's not forget to refactor the data layers at some point though. |
Images layer: A data provider layer directly from images
This data layer, reads a file with "path/image_filenames label" and provides a data and label tops in the same way data_layer does. But it doesn't require the images to be in a leveldb, or even doesn't the require the images to be resized in advance (although it is recommended for speed).
Still some speed test comparisons has to be done.