Refactor convolution layer and add deconvolution layer #1615

longjon · 2014-12-22T06:54:57Z

This PR adds a DeconvolutionLayer that flips the forward and backward passes of ConvolutionLayer. (The resulting operation is still convolution, but the sense of all the parameters is reversed, so that, in particular, strided deconvolution results in upsampling whereas strided convolution results in downsampling.)

Rather than duplicate all the ConvolutionLayer code, common sections are factored out into a parent class, BaseConvolutionLayer. The tricky GEMM parameters and the column buffer are hidden from the forward and backward implementations, which I hope you agree are much more readable.

Positives

Readable implementations of convolution and deconvolution.
Deconvolution supports all functions of convolution, including padding (note that padding is removed from the output rather than added to the input!), groups, rectangular kernels, biases.
Whereas ConvolutionLayer needs to do an im2col and a col2im in the backward pass, both operations in DeconvolutionLayer's backward pass require im2col, so a special flag is added to avoid doing this twice.

Reservations

The diff is pretty heavy; it was not straightforward to make the changes in a gradual way, so the history is constructed post-hoc. However, the code should be pretty understandable to the few of you intimately familiar with ConvolutionLayer. Also note that the diff size is really half what it appears to be, because of the point below.
Almost all the code is duplicated via s/cpu/gpu/. This PR is not the place to address this, but I hope we can get (some part of?) Device Abstraction #610 merged soon, because this is getting silly.
Is "deconvolution" really the right name? This layer does not have to be used in the context of undoing a convolution (although it could be). It seems likely that this name will stick one way or another.

Design choices

The original idea was to add convolution helper functions under util/. However, these end up requiring a large number of arguments (each needs most of the convolution parameters). So, one could wrap all the common arguments in a struct, like cuDNN... but then that struct may as well be the layer class itself, so we have the current design.
Rather than use a special flag to skip the extra im2col in deconv backward, we could leave the column buffer out of the wrapper functions. However, this leads to more duplicated code, and cuts down significantly on readability.
The forward helper and weight diff helper are both im2cols followed by gemms, so in theory we could unify them. However, then we need additional arguments and logic to figure out when to transpose the column buffer. I think this could still be done, but it's a bit of a wash.

Forthcoming

Doc comments for DeconvolutionLayer.
Tests specific to DeconvolutionLayer. Note that most of the functionality is exercised by the ConvolutionLayer tests. I've only briefly checked the forward/backward passes for upsampling/downsampling, so there could be a bug or two left.
cuDNN deconvolution should be straightforward to implement as well, since cuDNN already provides the same kind of abstraction as this refactoring. However, I won't be able to do that right away; PRs are welcome!

sguada · 2014-12-22T17:41:34Z

src/caffe/layers/base_conv_layer.cpp

+ }
+ // Special case: im2col is the identity for 1x1 convolution with stride 1
+ // and no padding, so flag for skipping the buffer and transformation.
+ is_1x1_ = kernel_w_ == 1 && kernel_h_ == 1


Please add the case for full convolution too

There's still no additional testing for the general expression, and anyway that's an orthogonal concern; how about I follow up with another PR that includes those tests? (I realized recently those tests don't even have to be done with ConvolutionLayer, they just need to check im2col, which should be easier...)

shelhamer · 2014-12-23T20:18:41Z

Thanks for bringing the convolution code to order and delivering full-fledged deconvolution at the same time Jon!

the few of you intimately familiar with ConvolutionLayer

All rise, secret order of the Caffe convolution...

Is "deconvolution" really the right name?

While you and I can keep saying "backward convolution," vision parlance seems to be converging on "deconvolution." I think all its usages will come under the "deconvolution" umbrella sooner or later so we might as well name it what everyone will look for.

Almost all the code is duplicated via s/cpu/gpu/

Right, #610 deserves attention once the fires are out. (Where the fires are data, Net owning phase and device, and double-checking the thread leak in dev.)

cuDNN deconvolution should be straightforward to implement as well

I could take a look at this since I did the original integration, and should warm-up for hacking cuDNN R2 as well. More likely a follow-up PR instead of pushing it here.

jeffdonahue · 2014-12-28T23:41:24Z

include/caffe/vision_layers.hpp

+ // wrap im2col/col2im so we don't have to remember the (long) argument lists
+ inline void conv_im2col_cpu(const Dtype* data, Dtype* col_buff) {
+ im2col_cpu(data, conv_in_channels_, conv_in_height_, conv_in_width_,
+ kernel_h_, kernel_w_, pad_h_, pad_w_, stride_h_, stride_w_, col_buff);


Would it be slightly more elegant to give BaseConvolutionLayer a private im2col_layer_ which {Dec,C}onvolutionLayer call Forward and Backward on, instead of these functions? (Possibly also saving some duplicated setup logic & private variables.)

Relatedly, I always kind of thought we should have a separate BiasLayer internally called by both InnerProductLayer and ConvolutionLayer, factoring out that little bit of logic, and allowing one to use biases without multiplicative weights, for whatever that's worth.

Just a minor thought -- definitely doesn't need to be done here as this is nice cleanup regardless, and of course deconvolution layer will be a welcome feature.

This looks good to merge to me if you think it's ready.

Yes, I think the im2col layer would probably be a bit better, and I agree with factoring out the bias, although I'd rather save those things for later PRs.

Other than that, I think I ought to add some tests that at least call deconv forward and backward, and then it'll be ready.

jyegerlehner · 2015-01-01T09:07:53Z

FWIW I've been running these changes and all seems to be working well.

This provides a common place for code used by ConvolutionLayer and DeconvolutionLayer, simplifying the implementations of both.

longjon · 2015-01-27T20:26:53Z

Thanks for the datapoint @jyegerlehner. I've added basic doc comments and tests, so this should be ready to go pending any further comments.

Add deconvolution layer with refactoring of convolution layer to share code

shelhamer · 2015-02-01T07:58:02Z

Thanks for adding deconvolution while making convolution look the most sane it ever has Jon!

JiaxuZhu · 2015-03-18T07:58:13Z

Can this layer used for projecting one activation in a given feature map down to the image pixel?

ducha-aiki · 2015-03-18T09:07:03Z

@JiaxuZhu this is exactly purpose of this layer

JiaxuZhu · 2015-03-18T09:26:50Z

@ducha-aiki Thanks!
But I cannot find any tutorial about how to use this layer.
For example, I don't know how to set all feature maps to zeros except top 9 activations.

yyyek · 2015-06-17T09:31:25Z

Is there any tutorial for this layer? I don't know how to use it

Tgaaly · 2015-06-26T17:33:08Z

When doing convolution followed by a deconvolution to reconstruct the input, if the num_output parameter (i.e. the number of filters) is equal, this causes an error (below). If the num_output of only the deconvolution layer is set to 1 - it works. Is the num_output parameter different for convolution and deconvolution? Shouldn't the number of kernels be equal in both these layers? Here my input data is 5x5 in batches of 100. Below is my prototxt script.

Check failed: bottom[0]->count() == bottom[1]->count() (40000 vs. 2500) SIGMOID_CROSS_ENTROPY_LOSS layer inputs must have the same count.

# CONVOLUTION PART

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 16
    kernel_size: 4
    kernel_size: 4
    stride: 1
  }
}
layer {
  name: "sigm"
  type: "Sigmoid"
  bottom: "conv1"
  top: "conv1"
}

# DECONVOLUTION PART

layer {
  name: "deconv1"
  type: "Deconvolution"
  bottom: "conv1"
  top: "deconv1"
  convolution_param {
    num_output: 16 # if set to 1 - it works
    kernel_size: 4
    kernel_size: 4
    stride: 1
  }
}
layer {
  name: "sigm"
  type: "Sigmoid"
  bottom: "deconv1"
  top: "deconv1"
}
...
layer {
  name: "loss"
  type: "SigmoidCrossEntropyLoss"
  bottom: "deconv1"
  bottom: "flatdata"
  top: "cross_entropy_loss"
  loss_weight: 1
}

jyegerlehner · 2015-06-26T18:39:41Z

The two bottom blobs of the loss layer have to have the same shape. Otherwise it can't compute the differences between their elements. Assuming "data" is the same blob as "flatdata", the bottom blob of the "conv1" layer, the message implies it has one channel. Therefore the top blob of "deconv1" also must have 1 channel, which means num_output needs to be 1.

shelhamer · 2015-06-26T19:42:09Z

@jyegerlehner right. This not a limit on conv / deconv layers in any way but a requirement of the SigmoidCrossEntropyLoss layer. Thanks for commenting.

Tgaaly · 2015-06-26T20:01:25Z

Thanks @jyegerlehner and @shelhamer for the response! Is there a design workaround for this? to be able to do deconvolution with the same number of kernels as the convolution?

Tgaaly · 2015-06-27T02:48:14Z

Every time I train this autoencoder-like network (shown above) with the deconvolutional layer, the training loss decreases once and then stays the same value (6412.21 shown below) for as long as I run epochs. Have you seen similar behavior?

I0626 22:42:00.732342 13649 solver.cpp:223] Learning Rate Policy: inv
I0626 22:42:00.732354 13649 solver.cpp:266] Iteration 0, Testing net (#0)
I0626 22:42:16.881786 13649 solver.cpp:315]     Test net output #0: cross_entropy_loss = 8846.19 (* 1 = 8846.19 loss)
I0626 22:42:16.881834 13649 solver.cpp:315]     Test net output #1: l2_error = 1751.28 (* 1 = 1751.28 loss)
I0626 22:42:17.051527 13649 solver.cpp:189] Iteration 0, loss = 10631.6
I0626 22:42:17.051576 13649 solver.cpp:204]     Train net output #0: cross_entropy_loss = 8873.56 (* 1 = 8873.56 loss)
I0626 22:42:17.051595 13649 solver.cpp:204]     Train net output #1: l2_error = 1758.01 (* 1 = 1758.01 loss)
I0626 22:42:17.051619 13649 solver.cpp:464] Iteration 0, lr = 0.01
I0626 22:43:20.310369 13649 solver.cpp:189] Iteration 100, loss = 7569.84
I0626 22:43:20.310458 13649 solver.cpp:204]     Train net output #0: cross_entropy_loss = 6412.21 (* 1 = 6412.21 loss)
I0626 22:43:20.310482 13649 solver.cpp:204]     Train net output #1: l2_error = 1157.63 (* 1 = 1157.63 loss)
I0626 22:43:20.310499 13649 solver.cpp:464] Iteration 100, lr = 0.00992565
I0626 22:44:23.551571 13649 solver.cpp:189] Iteration 200, loss = 7569.84
I0626 22:44:23.551831 13649 solver.cpp:204]     Train net output #0: cross_entropy_loss = 6412.21 (* 1 = 6412.21 loss)
I0626 22:44:23.551865 13649 solver.cpp:204]     Train net output #1: l2_error = 1157.63 (* 1 = 1157.63 loss)
I0626 22:44:23.551882 13649 solver.cpp:464] Iteration 200, lr = 0.00985258

jyegerlehner · 2015-06-27T23:06:04Z

@Tgaaly No, I haven't encountered a problem like that.

One thing that looks suspicious is that l2 error is so large, when the sigmoids can only give you a response from [0,1], or [-1,1], forget which. As if the data hasn't been scaled to a range that sigmoids can produce.

By the way in my particular case I'm using PReLUs not sigmoids. You might try the way-overcomplete approach with lots of deconv output channels, followed by perhaps PReLUs and then 1x1 convs to bring the number of channels down to what you need (1 channel ?). Those are just a few random thoughts. Hard to say more without knowing more about the problem you are solving and what you are doing.

engr3os · 2015-11-09T22:21:20Z

@Tgaaly You don't have to make flatdata layer since you are dealing with colored images (I suppose). You can set the group parameter of convolution parameters to 3 in both conv and deconv layers. This should take care of your problem. You can also use dropout layer if you want to achieve dropout auto encoder.

longjon mentioned this pull request Dec 22, 2014

Deconvolution layer? #1610

Closed

longjon force-pushed the deconv-layer branch 3 times, most recently from 0957e39 to 7990bbe Compare December 22, 2014 07:41

sguada reviewed Dec 22, 2014
View reviewed changes

This was referenced Dec 28, 2014

Augment layers with their induced coordinate maps #1637

Closed

Crop layer for automatically aligning computations #1639

Closed

jeffdonahue reviewed Dec 28, 2014
View reviewed changes

longjon force-pushed the deconv-layer branch from 7990bbe to b878285 Compare December 29, 2014 02:13

shelhamer added the focus label Dec 30, 2014

longjon force-pushed the deconv-layer branch from b878285 to 816c6db Compare December 31, 2014 06:34

longjon mentioned this pull request Dec 31, 2014

Decouple the computational batch size and minibatch size by accumulating gradients #1663

Closed

longjon added 4 commits January 11, 2015 00:28

add BaseConvolutionLayer

8d2aebc

This provides a common place for code used by ConvolutionLayer and DeconvolutionLayer, simplifying the implementations of both.

add CPU_ONLY ifdef guards to BaseConvolutionLayer

a0e9db1

rewrite ConvolutionLayer to use BaseConvolutionLayer helpers

e3e2f2d

add DeconvolutionLayer, using BaseConvolutionLayer

3617352

longjon force-pushed the deconv-layer branch from 816c6db to 3617352 Compare January 11, 2015 08:30

This was referenced Jan 15, 2015

Configurate Convolutional Auto-Encoders in Caffe #1647

Closed

Dense per pixel label output #1019

Closed

longjon added the ready for review label Jan 27, 2015

longjon added 2 commits January 27, 2015 13:17

document DeconvolutionLayer

408133c

[test] simple test for DeconvolutionLayer

25c2e3f

longjon force-pushed the deconv-layer branch from 1fb0c2f to 25c2e3f Compare January 27, 2015 21:18

longjon mentioned this pull request Jan 27, 2015

Blobs are N-D arrays (for N not necessarily equals 4) #1486

Closed

shelhamer added a commit that referenced this pull request Feb 1, 2015

Merge pull request #1615 from longjon/deconv-layer

9767b99

Add deconvolution layer with refactoring of convolution layer to share code

shelhamer merged commit 9767b99 into BVLC:dev Feb 1, 2015

shelhamer mentioned this pull request Feb 26, 2015

Crop layer for automatically aligning computations #1976

Closed

erictzeng mentioned this pull request Mar 17, 2015

Visualization tools in Caffe #2133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor convolution layer and add deconvolution layer #1615

Refactor convolution layer and add deconvolution layer #1615

longjon commented Dec 22, 2014

sguada Dec 22, 2014

longjon Dec 24, 2014

shelhamer commented Dec 23, 2014

jeffdonahue Dec 28, 2014

longjon Dec 31, 2014

jyegerlehner commented Jan 1, 2015

longjon commented Jan 27, 2015

shelhamer commented Feb 1, 2015

JiaxuZhu commented Mar 18, 2015

ducha-aiki commented Mar 18, 2015

JiaxuZhu commented Mar 18, 2015

yyyek commented Jun 17, 2015

Tgaaly commented Jun 26, 2015

jyegerlehner commented Jun 26, 2015

shelhamer commented Jun 26, 2015

Tgaaly commented Jun 26, 2015

Tgaaly commented Jun 27, 2015

jyegerlehner commented Jun 27, 2015

engr3os commented Nov 9, 2015

Refactor convolution layer and add deconvolution layer #1615

Refactor convolution layer and add deconvolution layer #1615

Conversation

longjon commented Dec 22, 2014

sguada Dec 22, 2014

Choose a reason for hiding this comment

longjon Dec 24, 2014

Choose a reason for hiding this comment

shelhamer commented Dec 23, 2014

jeffdonahue Dec 28, 2014

Choose a reason for hiding this comment

longjon Dec 31, 2014

Choose a reason for hiding this comment

jyegerlehner commented Jan 1, 2015

longjon commented Jan 27, 2015

shelhamer commented Feb 1, 2015

JiaxuZhu commented Mar 18, 2015

ducha-aiki commented Mar 18, 2015

JiaxuZhu commented Mar 18, 2015

yyyek commented Jun 17, 2015

Tgaaly commented Jun 26, 2015

jyegerlehner commented Jun 26, 2015

shelhamer commented Jun 26, 2015

Tgaaly commented Jun 26, 2015

Tgaaly commented Jun 27, 2015

jyegerlehner commented Jun 27, 2015

engr3os commented Nov 9, 2015