Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using encoded images in Datum, LevelDB, LMDB #1239

Merged
merged 11 commits into from
Oct 16, 2014
Merged

Conversation

sguada
Copy link
Contributor

@sguada sguada commented Oct 8, 2014

This PR allows to store the image files directly into the DB, which saves a lot of space and sacrifices only a little bit of speed (not noticeable since it is done during the prefetching).

  • To convert a set of images and leave them encoded (just copy the copy image file) do:
    ./build/tools/conver_imageset -encoded=true
  • Added a TIMING flag to Makefile to see if prefetch is fast enough compared to the forward/backward pass.
    • To use add TIMING := 1 to your Makefile.config
    • This would display the time used in prefetch (reading images + transforming images)
    • The time used in the forward pass
    • The time used in the backward pass
  • Speeded up and update the code in WindowDatalayer:
    • Now it read and transform faster the window images into blobs.
    • It has an option to cache_images into memory in case of slow HDD. It caches the encoded images to save memory.
    • Now it also accepts mean_values instead of mean_file.
  • This PR also speed up the transformation applied to cv::Mat, this should allow to add easily new transformation done directly with cv::Mat using OpenCV

@sguada
Copy link
Contributor Author

sguada commented Oct 8, 2014

@s-gupta Could you give it a try and let me know if it is faster for you?

@jeffdonahue
Copy link
Contributor

Thanks @sguada, this has been on my wishlist for a while. Looking forward to wasting far less disk when it's merged.

One thing I'm not sure about is the TIMING parts. As useful as they are for benchmarking prefetching and such, IMO they clutter the code in some places -- like ForwardBackward, which goes from 4 clean lines to 11 lines that are mostly about timing. Maybe we could hold off on merging those commits, or figure out a slightly cleaner way to do it? Ideally we'd find a way to improve the caffe time tool such that we can actually benchmark prefetching, but that might be pretty difficult.

@sguada
Copy link
Contributor Author

sguada commented Oct 9, 2014

@jeffdonahue I tried to keep to the TIMING minimum to the minimum, but still it clutters the code some places.
So I can remove the all the TIMING code, and leave for a future PR. Probably just move it to the caffe time tool.

I will fix the warnings and want to merge soon, since this is also very useful for parallel #1148, since it decrease greatly the memory need to map multiple LMDB in memory.

@sguada sguada mentioned this pull request Oct 9, 2014
6 tasks
@thatguymike
Copy link
Contributor

Why disable resizing the images for storage? You will want to be careful about the quality factor used, but storing resized images can drastically speed up the decode. Most encoders will default to q85 which also mean 420 sampling in JPEG, so could change sampling and artifacting in the input image a little, however 420 increases compression ratio a lot, one of the main reasons in the large change in final size at q85. Also, decode, especially of progressive images can be more expensive than one would think, so we might want more decode threads, at least for "fat node" cases where there is a lot of horsepower to feed.

Secondarily, why do we use OpenCV for image decode and not GraphicsMagick/ImageMagick which can be faster (sometime much faster because of parallel DCT support) and more flexible on decode (and encode)? I see cvMat's in the code in a few places, so if that is really needed it makes sense.

@thatguymike
Copy link
Contributor

Although, not resizing makes the database more useful for exploring different layer 1 input sizes...

@sguada
Copy link
Contributor Author

sguada commented Oct 9, 2014

@thatguymike thanks for your comments.
The reason of not allowing the resizing of images while storing them encoded, is because it just store the raw file, without any operation. But you can resize the images in parallel before creating the DB using your prefered method.
Regarding the OpenCv, currently we only use it for loading images and doing simple transformations, like cropping or resizing. So if you, or someone else is willing to add support to GraphicsMagick/ImageMagick for that, I will be happy to include that option.

@sguada
Copy link
Contributor Author

sguada commented Oct 9, 2014

@thatguymike GraphicsMagick looks pretty good, unfortunately it has a Blob class that could be conflicting with Caffe::Blob.
@jeffdonahue should I leave the code for timing the prefetching, I think the other part of timing could be done in caffe time but that could not access the prefetch timing. I found it very useful to know the prefetch time during my test and optimizing the code.

@shelhamer
Copy link
Member

@sguada this should be coordinated with #1238. Perhaps the new interfaces should come first, and then the finished version of this idea once all the discussion is in?

I'm excited for the shrunken DBs!

@sguada
Copy link
Contributor Author

sguada commented Oct 10, 2014

@jeffdonahue I have remove the TIMING flag and change caffe time to compute the time per layer but doing forward/backward passes over all the layers. This way we will get a better sense of the actual time spent by layer.
I have leaved the timing for prefetch in the data layers, but only prints the result when Debug is on.

@jeffdonahue
Copy link
Contributor

Cool, looks good to me, but not sure if #1238 should be merged first.

@sguada
Copy link
Contributor Author

sguada commented Oct 11, 2014

@shelhamer I think this PR is ready to go, so I'm not sure either if #1238 should be merge first.

@sguada sguada mentioned this pull request Oct 11, 2014
@shelhamer
Copy link
Member

@sguada let's merge #1238 first for the interfaces then merge this for the encoding and timing if that's alright with you. Please take a look at #1238 and merge, then rebase and merge your work in this PR.

@kmatzen kmatzen mentioned this pull request Oct 12, 2014
@@ -142,6 +146,11 @@ void DataLayer<Dtype>::DataLayerSetUp(const vector<Blob<Dtype>*>& bottom,
// This function is used to create a thread that prefetches the data.
template <typename Dtype>
void DataLayer<Dtype>::InternalThreadEntry() {
Timer batch_timer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this Timer implementation isn't thread safe when Caffe::mode() == Caffe::GPU. It frequently causes a segfault in the cuda library on my system.

It's implemented using the cudaEvent* API which records events with respect to the active cuda stream.
http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__EVENT_ga324d5ce3fbf46899b15e5e42ff9cfa5.html
"Records an event. If stream is non-zero, the event is recorded after all preceding operations in stream have been completed; otherwise, it is recorded after all preceding operations in the CUDA context have been completed."

src/caffe/util/benchmark.cpp

void Timer::Start() {
  if (!running()) {
    if (Caffe::mode() == Caffe::GPU) {
#ifndef CPU_ONLY
      CUDA_CHECK(cudaEventRecord(start_gpu_, 0));
#else
      NO_GPU;
#endif
    } else {
      start_cpu_ = boost::posix_time::microsec_clock::local_time();
    }   
    running_ = true;
    has_run_at_least_once_ = true;
  }
}

In any case, I'm pretty sure you don't want your prefetch timer to be synchronized with what's happening on the GPU.

@sguada
Copy link
Contributor Author

sguada commented Oct 13, 2014

@kmatzen I have never had any problems using that Timer and cuda. I'm not sure about the desired behaviour for the GPU timers, since we don't use streams, I think waiting until all the streams to be done makes sense.
However I see the problem that you mentioned for code that run in CPU, like prefetching, which should not be waiting for other code running in GPU to finish.
Maybe we should have a CPUTimer that runs in CPU independently of the Caffe::mode()

@sguada
Copy link
Contributor Author

sguada commented Oct 13, 2014

For reference, when storing ImageNet resized to 256x256 into a LMDB with encoded=true occupy 10-12x less space that when encoded=false. Prefetching times are a bit slower with encoded=true although it is way under the forward-backward pass time.

@sguada sguada force-pushed the encoded branch 2 times, most recently from 7c910e8 to 1493661 Compare October 13, 2014 07:29
@sguada
Copy link
Contributor Author

sguada commented Oct 13, 2014

@kmatzen Could you try the new version with CPUTimer and see if still fails for you.

@kloudkl
Copy link
Contributor

kloudkl commented Oct 13, 2014

@kmatzen is the owner of unifying the usages of data stores. He will clean up all the existing interfaces anyway. Why not merge this first?

@sguada sguada force-pushed the encoded branch 2 times, most recently from 4f5714f to 3033da8 Compare October 13, 2014 17:59
sguada added a commit that referenced this pull request Oct 16, 2014
Allow using encoded images in Datum, LevelDB, LMDB
@sguada sguada merged commit e354478 into BVLC:dev Oct 16, 2014
@sguada sguada deleted the encoded branch October 16, 2014 00:17
@longjon
Copy link
Contributor

longjon commented Oct 18, 2014

I'm confused about what happened here; @jeffdonahue suggested that FowardBackward not be invaded with TIMING ifdefs, and it seems @sguada "remove[d] the TIMING" flag, but the changes to ForwardBackward are merged in dev.

Is that what both parties intended? I'm in agreement with @jeffdonahue's concern here (and in fact it doesn't seem like there is Makefile support for TIMING anyway).

Also, just glancing at the code, it seems that the data transformations that we worked so carefully to de-duplicate have doubled again, once for Datum and once for cv::Mat. What's going on here?

As with #1238, this seems like a lot of code merged pretty quickly, and my general concerns at #1238 (comment) apply.

@sguada
Copy link
Contributor Author

sguada commented Oct 18, 2014

@longjon sorry I thought I have removed all TIMING, but it seems I forgot one place. Done in #1320

Yeah @jeffdonahue and me agreed to remove all the TIMING and instead improve caffe time, which I did.

@sguada
Copy link
Contributor Author

sguada commented Oct 18, 2014

@longjon Regarding the de-duplication of code for different Transform, depending if it is Datum, cv::Mat or Blob what you are transforming, I couldn't find a way to do it while being efficient.
Since Datum and cv::Mat organize the data in memory differently I couldn't find a way to do both with a single code. I experiment with a simple way to unify in https://github.com/BVLC/caffe/pull/1239/files#diff-5 but it was slower due to copying back a cv::Mat into a Datum.

@@ -179,9 +179,102 @@ void DataTransformer<Dtype>::Transform(const vector<Datum> & datum_vector,
template<typename Dtype>
void DataTransformer<Dtype>::Transform(const cv::Mat& cv_img,
Blob<Dtype>* transformed_blob) {
Datum datum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked this better as 3 lines. Is the cv::Mat pipeline speed so crucial / does it make such a difference to first transform to Datum?

@longjon
Copy link
Contributor

longjon commented Oct 18, 2014

@sguada re: timing: Good. I haven't had a chance to try it out yet, but the improvements to caffe time seem right to me in spirit.

Re: the transformation code (I guess there are actually three copies of it right now...): the differences I see right now between the three cases are: Datum and Blob have different types, and Blob has an extra dimension; cv::Mat has a different axis ordering.

There's no reason we can't abstract over those things (without introducing any copying). For dealing with the different memory layout, one specifies strides along each axis; this is how cuDNN works, how numpy works internally, and something Eigen could probably do for us (though I am not suggesting that we introduce it here).

@sguada
Copy link
Contributor Author

sguada commented Oct 18, 2014

It's a bit more complicated than that, cv::mat can be discontinuous in
memory. There different types to cast. But I guess is possible to unify all
with a lot ifs.

On Saturday, October 18, 2014, longjon notifications@github.com wrote:

@sguada https://github.com/sguada re: timing: Good. I haven't had a
chance to try it out yet, but the improvements to caffe time seem right
to me in spirit.

Re: the transformation code (I guess there are actually three copies of it
right now...): the differences I see right now between the three cases are:
Datum and Blob have different types, and Blob has an extra dimension;
cv::Mat has a different axis ordering.

There's no reason we can't abstract over those things (without introducing
any copying). For dealing with the different memory layout, one specifies
strides along each axis; this is how cuDNN works, how numpy works
internally, and something Eigen could probably do for us (though I am not
suggesting that we introduce it here).


Reply to this email directly or view it on GitHub
#1239 (comment).

Sergio

@longjon
Copy link
Contributor

longjon commented Oct 18, 2014

@sguada From the documentation at http://docs.opencv.org/modules/core/doc/basic_structures.html#mat, it seems pretty clear that cv::Mat uses a strided format just like a numpy ndarray, or Blob, or Datum. (Yes, it can be discontinuous, but it still uses a strided addressing scheme (rather than, say, storing offsets, which would actually need different indexing code (though note that the indexing can (and should?) be abstracted away separately from the transformation itself...)).)

So I don't see any need for if casing, really, just a function (template) that takes strides as input, with a type parameter for uint8 vs Dtype storage. If it's not clear, I'm happy to add the unification to my own agenda, but I won't get to it for a while.

@sguada
Copy link
Contributor Author

sguada commented Oct 18, 2014

@longjon thanks, I think I understand what you mean now, so I will give it a try, and ask you if have doubts. :)

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Allow using encoded images in Datum, LevelDB, LMDB
@ih4cku ih4cku mentioned this pull request May 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants