You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run a caffe with small stride on big images and ran into memory issues. I tried out PR #520, but even on quite small images (480x640) and an okay sized model (params and blobs take about 2GB on that image) the Consumption on CPU is at ~12GB. I assume the difference is in colbuffers since that's where I get the out of memory error. My understanding of FFT based convolutions is, that it also won't solve my memory problems.
What do you think about adding a convolution implementation that doesn't use additional memory? cuda-convnet [1] seems to be quite fast judging from the benchmark at [2]. The convolution code doesn't exactly look simple, so it doesn't look like a no-brainer to me to add it into caffe. Does it make any sense?
It is a little non-trivial, and cuda-convnet actually uses a different order. We are exploring alternate approaches which may achieve the same (or better) goal, so incorporating cuda-convnet may not be on our radar (at least for now).
Closing as duplicate of #830 to focus the conversation now that the memory aspect has been noted there. While we expect our alternative approach to address memory usage and speed, you are welcome to try integrating cuda-convnet2 convolution for comparison.
I want to run a caffe with small stride on big images and ran into memory issues. I tried out PR #520, but even on quite small images (480x640) and an okay sized model (params and blobs take about 2GB on that image) the Consumption on CPU is at ~12GB. I assume the difference is in colbuffers since that's where I get the out of memory error. My understanding of FFT based convolutions is, that it also won't solve my memory problems.
What do you think about adding a convolution implementation that doesn't use additional memory? cuda-convnet [1] seems to be quite fast judging from the benchmark at [2]. The convolution code doesn't exactly look simple, so it doesn't look like a no-brainer to me to add it into caffe. Does it make any sense?
[1] https://code.google.com/p/cuda-convnet/
[2] https://github.com/soumith/convnet-benchmarks
The text was updated successfully, but these errors were encountered: