implemented padding aware im2col and col2im functions #99

mavenlin · 2014-02-12T11:07:50Z

Implemented padded_im2col and padded_col2im for both GPU and CPU versions. Convolution layer and im2col layers are updated to use the padded version when the pad parameter is not zero. To use, just remove padding layers and add parameter "pad" to the convolution layer.

In the case of imagenet, removing the padding layer reduces the device memory usage from 3700M to 3100M saving about 600M memory.

According to the benchmark (#85), padding layers in imagenet takes about 3% of the total execution time for GPU mode. Removing the padding layer is not gonna improve the speed a lot. On the other hand, padded_im2col adds operation to the kernel code. Thus the speed is not gonna improve, or just negligible if there is. On my Titan card, code with padding runs 50 batches in 70s, without padding improves 1s to 69s.

tdomhan · 2014-02-12T17:39:32Z

nice! I don't see the need for having a version with and another version without padding though, if you can just set the padding to zero and it should still work.

Yangqing · 2014-02-12T17:41:47Z

Would you mind writing a short test to compare the direct version vs the
padding + original im2col version? After that we can safely move to always
use the padding-aware im2col and col2im as suggested by Tobias.

Yangqing

On Wed, Feb 12, 2014 at 9:39 AM, Tobias Domhan notifications@github.comwrote:

nice! I don't see the need for having a version with and another version
without padding though, if you can just set the padding to zero and it
should still work.

Reply to this email directly or view it on GitHubhttps://github.com//pull/99#issuecomment-34894268
.

sguada · 2014-02-14T01:38:39Z

@shelhamer DON'T MERGE until it is clean

@mavenlin the commit for from_other filler should be in another Pull Request.

It is nice idea although the name is a bit confusing, since the it is not loading from another filler but from a blob within a snapshot.
Maybe it will be good to also define from_file filler to load the values from a proto file that just contains the need blob.

mavenlin · 2014-02-14T01:44:14Z

@sguada I did not mean to put it in this pull request, it seems my new commit to my fork is automatically added, is there anyways to prevent this?

shelhamer · 2014-02-14T01:48:34Z

@mavenlin this is a strong reason you should develop in features branches and not master. Any commits you push to the PR source branch will be added to the PR.

To fix this do

# on master branch
git checkout dev # to make a dev branch with all your commits
git rebase -i BVLC/master # where BVLC is the remote name for BVLC/caffe

and delete any commits you do not want to be part of this PR, then

# on master: push your rebased master with only the commits
# you want to PR to your fork, updating this PR
git push -f origin master

and wait for your PR to be merged and don't do any development in master. You can pick up on your work in dev.

To make this more clear, I will write a short tutorial on contributing to Caffe soon to spell out these issues and save trouble for everyone.

mavenlin · 2014-02-14T02:03:50Z

@shelhamer Thanks for explanation, updated. I'm writing the testing code to confirm.

mavenlin · 2014-02-14T07:53:21Z

Test code added.
Padding Aware Convolution vs Padding + Convolution
Both forward pass and backward pass are tested.
There is not difference.

Run the test code like below:
./build/examples/padded_im2col_test.bin GPU 50 16 4 4 32 1 10

shows:
Using GPU
loss0: 3.92048 loss1: 3.92048 euclidean distance: 0
diff_loss0: 0.101842 diff_loss1: 0.101842 diff euclidean distance: 0

kloudkl · 2014-02-14T08:25:12Z

The major benefit seems to be significant reduction of memory usage. Would you like to add detailed relevant statistics in the spirit of #83?

mavenlin · 2014-02-14T09:16:42Z

@kloudkl I don't wanna do that since it is not slowed down.

mavenlin · 2014-02-18T09:47:54Z

The following is a test of the time cost using different parameters.
The parameters are chosen approximately according to the imagenet model.
The calculation is repeated for 50 times and averaged so that the timing is more accurate.
The unit of the time is ms.

pa : padding aware

input	channel	pad	kernel	num_output	stride	pa forward	pad+forward	pa backward	pad+backward
200	3	5	11	96	4	44.89	46.03	114.908	116.172
50	96	2	5	256	1	272.40	284.724	640.489	654.98
27	256	1	3	384	1	135.497	141.488	268.397	285.513
13	384	1	3	384	1	82.7817	83.9382	124.094	125.941
13	384	1	3	256	1	61.5992	62.4341	84.8114	87.7916

Consistently, padding aware convolution is faster than pad + convolution.
Backward calculation is easy to understand, the instruction number in the kernel function is changed very little.
For forward calculation, though comparison operations are added to all the threads, and divergence is added to the warp of threads cross the image border. It turns out that more time is saved from the memory read write cost.

Another test is carried out to compare padding aware im2col to original im2col when the padding size is 0.
As in col2im the instruction number in the kernel function is changed very little, it is not tested.

input	channel	kernel	num_output	stride	pa im2col	im2col
200	3	11	96	4	11.3499	10.4495
50	96	5	256	1	24.6174	23.6767
27	256	3	384	1	9.07456	8.52179
13	384	3	384	1	3.7848	3.67993
13	384	3	256	1	3.82074	3.59396

I think it would be no harm and some benefit to stick to the original im2col if pad = 0.
The two benefits are:

original im2col is faster when pad=0;
backward compatibility. If a model definition file uses padding layer, it won't loss performance.

@tdomhan @Yangqing What do you think?

@shelhamer I'll rebase on dev and submit a new PR

Add Faster RCNN support

standardize memory optimization configurations * yjxiong/fix/mem_config: take care of share data with excluded blob improvise memory opt configs fix cudnn conv legacy bug (BVLC#96) add TOC Update README.md Update README.md (BVLC#95) Update README.md Improve the python interface (BVLC#80) Update README.md

* improvise memory opt configs * take care of share data with excluded blob

Fix to ignore 255 key as suggested by JimKlingshirn in BVLC#95

kloudkl mentioned this pull request Feb 13, 2014

Identify the critical parts of computation time in GPU mode #102

Closed

shelhamer added the enhancement label Feb 15, 2014

mavenlin closed this Feb 18, 2014

mavenlin mentioned this pull request Feb 18, 2014

padding aware im2col and col2im functions #128

Merged

conner99 pushed a commit to conner99/caffe that referenced this pull request Jul 18, 2016

Merge pull request BVLC#99 from zer0n/master

42e6c85

Add Faster RCNN support

Cysu pushed a commit to Cysu/caffe that referenced this pull request Aug 19, 2016

standardize memory optimization configurations (BVLC#99)

4506029

* improvise memory opt configs * take care of share data with excluded blob

mindcont pushed a commit to mindcont/caffe that referenced this pull request Apr 25, 2017

Merge pull request BVLC#99 from clancyian/master

479811e

Fix to ignore 255 key as suggested by JimKlingshirn in BVLC#95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implemented padding aware im2col and col2im functions #99

implemented padding aware im2col and col2im functions #99

mavenlin commented Feb 12, 2014

tdomhan commented Feb 12, 2014

Yangqing commented Feb 12, 2014

sguada commented Feb 14, 2014

mavenlin commented Feb 14, 2014

shelhamer commented Feb 14, 2014

mavenlin commented Feb 14, 2014

mavenlin commented Feb 14, 2014

kloudkl commented Feb 14, 2014

mavenlin commented Feb 14, 2014

mavenlin commented Feb 18, 2014

implemented padding aware im2col and col2im functions #99

implemented padding aware im2col and col2im functions #99

Conversation

mavenlin commented Feb 12, 2014

tdomhan commented Feb 12, 2014

Yangqing commented Feb 12, 2014

sguada commented Feb 14, 2014

mavenlin commented Feb 14, 2014

shelhamer commented Feb 14, 2014

mavenlin commented Feb 14, 2014

mavenlin commented Feb 14, 2014

kloudkl commented Feb 14, 2014

mavenlin commented Feb 14, 2014

mavenlin commented Feb 18, 2014