forked from BVLC/caffe
-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing memory usage during inference #498
Comments
Hi @venkai , wait a minute. We do use global space for all CuDNN Convolution layers: |
cuDNN flow is clean, closing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I am working with a very deep fully convolutional architecture that currently takes up ~12G memory for a single 600 x 800 image during inference. To reduce memory, I modified net.cpp to allow duplicate top blobs during inference (gist here), and rewrote the prototxt for inference to use minimal unique activations. While this works and lowers memory usage (the net can now handle a 1280 x 720 image), the reduction in memory usage however is far less than I anticipated.
For instance, consider a simple feed-forward network without any branches: A1->A2->A3->A4->....->An
Ex:
Conv->BN->ReLU->Conv->BN->ReLU->...
If all activations from A1 to An are of the same size, then we should be able to do inference by storing only 2 activations in memory and juggling computation between the two, like X->Y->X->Y->X->...
In practice however, the memory taken by this network is much much greater than that of 2 activations.
I suspect this is because of unique internal buffers used in each layer and/or a seperate workspace being used for each cudnn-convolution layer. In vanilla caffe, there was a trick to make the internal buffers static (here). This however doesn't work with cudnn. Is there something similar we can do here with internal buffers ? Also, is it possible to use a global workspace for all convolution layers, like in MxNet ?
Thnx
The text was updated successfully, but these errors were encountered: