Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented padding aware im2col and col2im functions #99

Closed
wants to merge 0 commits into from
Closed

implemented padding aware im2col and col2im functions #99

wants to merge 0 commits into from

Conversation

mavenlin
Copy link
Contributor

Implemented padded_im2col and padded_col2im for both GPU and CPU versions. Convolution layer and im2col layers are updated to use the padded version when the pad parameter is not zero. To use, just remove padding layers and add parameter "pad" to the convolution layer.

In the case of imagenet, removing the padding layer reduces the device memory usage from 3700M to 3100M saving about 600M memory.

According to the benchmark (#85), padding layers in imagenet takes about 3% of the total execution time for GPU mode. Removing the padding layer is not gonna improve the speed a lot. On the other hand, padded_im2col adds operation to the kernel code. Thus the speed is not gonna improve, or just negligible if there is. On my Titan card, code with padding runs 50 batches in 70s, without padding improves 1s to 69s.

@tdomhan
Copy link
Contributor

tdomhan commented Feb 12, 2014

nice! I don't see the need for having a version with and another version without padding though, if you can just set the padding to zero and it should still work.

@Yangqing
Copy link
Member

Would you mind writing a short test to compare the direct version vs the
padding + original im2col version? After that we can safely move to always
use the padding-aware im2col and col2im as suggested by Tobias.

Yangqing

On Wed, Feb 12, 2014 at 9:39 AM, Tobias Domhan notifications@github.comwrote:

nice! I don't see the need for having a version with and another version
without padding though, if you can just set the padding to zero and it
should still work.

Reply to this email directly or view it on GitHubhttps://github.com//pull/99#issuecomment-34894268
.

@sguada
Copy link
Contributor

sguada commented Feb 14, 2014

@shelhamer DON'T MERGE until it is clean

@mavenlin the commit for from_other filler should be in another Pull Request.

It is nice idea although the name is a bit confusing, since the it is not loading from another filler but from a blob within a snapshot.
Maybe it will be good to also define from_file filler to load the values from a proto file that just contains the need blob.

@mavenlin
Copy link
Contributor Author

@sguada I did not mean to put it in this pull request, it seems my new commit to my fork is automatically added, is there anyways to prevent this?

@shelhamer
Copy link
Member

@mavenlin this is a strong reason you should develop in features branches and not master. Any commits you push to the PR source branch will be added to the PR.

To fix this do

# on master branch
git checkout dev # to make a dev branch with all your commits
git rebase -i BVLC/master # where BVLC is the remote name for BVLC/caffe

and delete any commits you do not want to be part of this PR, then

# on master: push your rebased master with only the commits
# you want to PR to your fork, updating this PR
git push -f origin master

and wait for your PR to be merged and don't do any development in master. You can pick up on your work in dev.

To make this more clear, I will write a short tutorial on contributing to Caffe soon to spell out these issues and save trouble for everyone.

@mavenlin
Copy link
Contributor Author

@shelhamer Thanks for explanation, updated. I'm writing the testing code to confirm.

@mavenlin
Copy link
Contributor Author

Test code added.
Padding Aware Convolution vs Padding + Convolution
Both forward pass and backward pass are tested.
There is not difference.

Run the test code like below:
./build/examples/padded_im2col_test.bin GPU 50 16 4 4 32 1 10

shows:
Using GPU
loss0: 3.92048 loss1: 3.92048 euclidean distance: 0
diff_loss0: 0.101842 diff_loss1: 0.101842 diff euclidean distance: 0

@kloudkl
Copy link
Contributor

kloudkl commented Feb 14, 2014

The major benefit seems to be significant reduction of memory usage. Would you like to add detailed relevant statistics in the spirit of #83?

@mavenlin
Copy link
Contributor Author

@kloudkl I don't wanna do that since it is not slowed down.

@mavenlin
Copy link
Contributor Author

The following is a test of the time cost using different parameters.
The parameters are chosen approximately according to the imagenet model.
The calculation is repeated for 50 times and averaged so that the timing is more accurate.
The unit of the time is ms.

pa : padding aware

input channel pad kernel num_output stride pa forward pad+forward pa backward pad+backward
200 3 5 11 96 4 44.89 46.03 114.908 116.172
50 96 2 5 256 1 272.40 284.724 640.489 654.98
27 256 1 3 384 1 135.497 141.488 268.397 285.513
13 384 1 3 384 1 82.7817 83.9382 124.094 125.941
13 384 1 3 256 1 61.5992 62.4341 84.8114 87.7916

Consistently, padding aware convolution is faster than pad + convolution.
Backward calculation is easy to understand, the instruction number in the kernel function is changed very little.
For forward calculation, though comparison operations are added to all the threads, and divergence is added to the warp of threads cross the image border. It turns out that more time is saved from the memory read write cost.

Another test is carried out to compare padding aware im2col to original im2col when the padding size is 0.
As in col2im the instruction number in the kernel function is changed very little, it is not tested.

input channel pad kernel num_output stride pa im2col im2col
200 3 0 11 96 4 11.3499 10.4495
50 96 0 5 256 1 24.6174 23.6767
27 256 0 3 384 1 9.07456 8.52179
13 384 0 3 384 1 3.7848 3.67993
13 384 0 3 256 1 3.82074 3.59396

I think it would be no harm and some benefit to stick to the original im2col if pad = 0.
The two benefits are:

  1. original im2col is faster when pad=0;
  2. backward compatibility. If a model definition file uses padding layer, it won't loss performance.

@tdomhan @Yangqing What do you think?

@shelhamer I'll rebase on dev and submit a new PR

@mavenlin mavenlin closed this Feb 18, 2014
conner99 pushed a commit to conner99/caffe that referenced this pull request Jul 18, 2016
myfavouritekk added a commit to myfavouritekk/caffe that referenced this pull request Aug 11, 2016
standardize memory optimization configurations

* yjxiong/fix/mem_config:
  take care of share data with excluded blob
  improvise memory opt configs
  fix cudnn conv legacy bug (BVLC#96)
  add TOC
  Update README.md
  Update README.md (BVLC#95)
  Update README.md
  Improve the python interface (BVLC#80)
  Update README.md
Cysu pushed a commit to Cysu/caffe that referenced this pull request Aug 19, 2016
* improvise memory opt configs

* take care of share data with excluded blob
mindcont pushed a commit to mindcont/caffe that referenced this pull request Apr 25, 2017
Fix to ignore 255 key as suggested by JimKlingshirn in BVLC#95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants