Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor convolution layer and add deconvolution layer #1615

Merged
merged 6 commits into from
Feb 1, 2015

Conversation

longjon
Copy link
Contributor

@longjon longjon commented Dec 22, 2014

This PR adds a DeconvolutionLayer that flips the forward and backward passes of ConvolutionLayer. (The resulting operation is still convolution, but the sense of all the parameters is reversed, so that, in particular, strided deconvolution results in upsampling whereas strided convolution results in downsampling.)

Rather than duplicate all the ConvolutionLayer code, common sections are factored out into a parent class, BaseConvolutionLayer. The tricky GEMM parameters and the column buffer are hidden from the forward and backward implementations, which I hope you agree are much more readable.

Positives

  • Readable implementations of convolution and deconvolution.
  • Deconvolution supports all functions of convolution, including padding (note that padding is removed from the output rather than added to the input!), groups, rectangular kernels, biases.
  • Whereas ConvolutionLayer needs to do an im2col and a col2im in the backward pass, both operations in DeconvolutionLayer's backward pass require im2col, so a special flag is added to avoid doing this twice.

Reservations

  • The diff is pretty heavy; it was not straightforward to make the changes in a gradual way, so the history is constructed post-hoc. However, the code should be pretty understandable to the few of you intimately familiar with ConvolutionLayer. Also note that the diff size is really half what it appears to be, because of the point below.
  • Almost all the code is duplicated via s/cpu/gpu/. This PR is not the place to address this, but I hope we can get (some part of?) Device Abstraction #610 merged soon, because this is getting silly.
  • Is "deconvolution" really the right name? This layer does not have to be used in the context of undoing a convolution (although it could be). It seems likely that this name will stick one way or another.

Design choices

  • The original idea was to add convolution helper functions under util/. However, these end up requiring a large number of arguments (each needs most of the convolution parameters). So, one could wrap all the common arguments in a struct, like cuDNN... but then that struct may as well be the layer class itself, so we have the current design.
  • Rather than use a special flag to skip the extra im2col in deconv backward, we could leave the column buffer out of the wrapper functions. However, this leads to more duplicated code, and cuts down significantly on readability.
  • The forward helper and weight diff helper are both im2cols followed by gemms, so in theory we could unify them. However, then we need additional arguments and logic to figure out when to transpose the column buffer. I think this could still be done, but it's a bit of a wash.

Forthcoming

  • Doc comments for DeconvolutionLayer.
  • Tests specific to DeconvolutionLayer. Note that most of the functionality is exercised by the ConvolutionLayer tests. I've only briefly checked the forward/backward passes for upsampling/downsampling, so there could be a bug or two left.
  • cuDNN deconvolution should be straightforward to implement as well, since cuDNN already provides the same kind of abstraction as this refactoring. However, I won't be able to do that right away; PRs are welcome!

@longjon longjon mentioned this pull request Dec 22, 2014
@longjon longjon force-pushed the deconv-layer branch 3 times, most recently from 0957e39 to 7990bbe Compare December 22, 2014 07:41
}
// Special case: im2col is the identity for 1x1 convolution with stride 1
// and no padding, so flag for skipping the buffer and transformation.
is_1x1_ = kernel_w_ == 1 && kernel_h_ == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the case for full convolution too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still no additional testing for the general expression, and anyway that's an orthogonal concern; how about I follow up with another PR that includes those tests? (I realized recently those tests don't even have to be done with ConvolutionLayer, they just need to check im2col, which should be easier...)

@shelhamer
Copy link
Member

Thanks for bringing the convolution code to order and delivering full-fledged deconvolution at the same time Jon!

the few of you intimately familiar with ConvolutionLayer

All rise, secret order of the Caffe convolution...

Is "deconvolution" really the right name?

While you and I can keep saying "backward convolution," vision parlance seems to be converging on "deconvolution." I think all its usages will come under the "deconvolution" umbrella sooner or later so we might as well name it what everyone will look for.

Almost all the code is duplicated via s/cpu/gpu/

Right, #610 deserves attention once the fires are out. (Where the fires are data, Net owning phase and device, and double-checking the thread leak in dev.)

cuDNN deconvolution should be straightforward to implement as well

I could take a look at this since I did the original integration, and should warm-up for hacking cuDNN R2 as well. More likely a follow-up PR instead of pushing it here.

// wrap im2col/col2im so we don't have to remember the (long) argument lists
inline void conv_im2col_cpu(const Dtype* data, Dtype* col_buff) {
im2col_cpu(data, conv_in_channels_, conv_in_height_, conv_in_width_,
kernel_h_, kernel_w_, pad_h_, pad_w_, stride_h_, stride_w_, col_buff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be slightly more elegant to give BaseConvolutionLayer a private im2col_layer_ which {Dec,C}onvolutionLayer call Forward and Backward on, instead of these functions? (Possibly also saving some duplicated setup logic & private variables.)

Relatedly, I always kind of thought we should have a separate BiasLayer internally called by both InnerProductLayer and ConvolutionLayer, factoring out that little bit of logic, and allowing one to use biases without multiplicative weights, for whatever that's worth.

Just a minor thought -- definitely doesn't need to be done here as this is nice cleanup regardless, and of course deconvolution layer will be a welcome feature.

This looks good to merge to me if you think it's ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the im2col layer would probably be a bit better, and I agree with factoring out the bias, although I'd rather save those things for later PRs.

Other than that, I think I ought to add some tests that at least call deconv forward and backward, and then it'll be ready.

@jyegerlehner
Copy link
Contributor

FWIW I've been running these changes and all seems to be working well.

@longjon
Copy link
Contributor Author

longjon commented Jan 27, 2015

Thanks for the datapoint @jyegerlehner. I've added basic doc comments and tests, so this should be ready to go pending any further comments.

shelhamer added a commit that referenced this pull request Feb 1, 2015
Add deconvolution layer with refactoring of convolution layer to share code
@shelhamer shelhamer merged commit 9767b99 into BVLC:dev Feb 1, 2015
@shelhamer
Copy link
Member

Thanks for adding deconvolution while making convolution look the most sane it ever has Jon!

@JiaxuZhu
Copy link

Can this layer used for projecting one activation in a given feature map down to the image pixel?

@ducha-aiki
Copy link
Contributor

@JiaxuZhu this is exactly purpose of this layer

@JiaxuZhu
Copy link

@ducha-aiki Thanks!
But I cannot find any tutorial about how to use this layer.
For example, I don't know how to set all feature maps to zeros except top 9 activations.

@yyyek
Copy link

yyyek commented Jun 17, 2015

Is there any tutorial for this layer? I don't know how to use it

@Tgaaly
Copy link

Tgaaly commented Jun 26, 2015

When doing convolution followed by a deconvolution to reconstruct the input, if the num_output parameter (i.e. the number of filters) is equal, this causes an error (below). If the num_output of only the deconvolution layer is set to 1 - it works. Is the num_output parameter different for convolution and deconvolution? Shouldn't the number of kernels be equal in both these layers? Here my input data is 5x5 in batches of 100. Below is my prototxt script.

Check failed: bottom[0]->count() == bottom[1]->count() (40000 vs. 2500) SIGMOID_CROSS_ENTROPY_LOSS layer inputs must have the same count.

# CONVOLUTION PART

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 16
    kernel_size: 4
    kernel_size: 4
    stride: 1
  }
}
layer {
  name: "sigm"
  type: "Sigmoid"
  bottom: "conv1"
  top: "conv1"
}

# DECONVOLUTION PART

layer {
  name: "deconv1"
  type: "Deconvolution"
  bottom: "conv1"
  top: "deconv1"
  convolution_param {
    num_output: 16 # if set to 1 - it works
    kernel_size: 4
    kernel_size: 4
    stride: 1
  }
}
layer {
  name: "sigm"
  type: "Sigmoid"
  bottom: "deconv1"
  top: "deconv1"
}
...
layer {
  name: "loss"
  type: "SigmoidCrossEntropyLoss"
  bottom: "deconv1"
  bottom: "flatdata"
  top: "cross_entropy_loss"
  loss_weight: 1
}

@jyegerlehner
Copy link
Contributor

The two bottom blobs of the loss layer have to have the same shape. Otherwise it can't compute the differences between their elements. Assuming "data" is the same blob as "flatdata", the bottom blob of the "conv1" layer, the message implies it has one channel. Therefore the top blob of "deconv1" also must have 1 channel, which means num_output needs to be 1.

@shelhamer
Copy link
Member

@jyegerlehner right. This not a limit on conv / deconv layers in any way but a requirement of the SigmoidCrossEntropyLoss layer. Thanks for commenting.

@Tgaaly
Copy link

Tgaaly commented Jun 26, 2015

Thanks @jyegerlehner and @shelhamer for the response! Is there a design workaround for this? to be able to do deconvolution with the same number of kernels as the convolution?

@Tgaaly
Copy link

Tgaaly commented Jun 27, 2015

Every time I train this autoencoder-like network (shown above) with the deconvolutional layer, the training loss decreases once and then stays the same value (6412.21 shown below) for as long as I run epochs. Have you seen similar behavior?

I0626 22:42:00.732342 13649 solver.cpp:223] Learning Rate Policy: inv
I0626 22:42:00.732354 13649 solver.cpp:266] Iteration 0, Testing net (#0)
I0626 22:42:16.881786 13649 solver.cpp:315]     Test net output #0: cross_entropy_loss = 8846.19 (* 1 = 8846.19 loss)
I0626 22:42:16.881834 13649 solver.cpp:315]     Test net output #1: l2_error = 1751.28 (* 1 = 1751.28 loss)
I0626 22:42:17.051527 13649 solver.cpp:189] Iteration 0, loss = 10631.6
I0626 22:42:17.051576 13649 solver.cpp:204]     Train net output #0: cross_entropy_loss = 8873.56 (* 1 = 8873.56 loss)
I0626 22:42:17.051595 13649 solver.cpp:204]     Train net output #1: l2_error = 1758.01 (* 1 = 1758.01 loss)
I0626 22:42:17.051619 13649 solver.cpp:464] Iteration 0, lr = 0.01
I0626 22:43:20.310369 13649 solver.cpp:189] Iteration 100, loss = 7569.84
I0626 22:43:20.310458 13649 solver.cpp:204]     Train net output #0: cross_entropy_loss = 6412.21 (* 1 = 6412.21 loss)
I0626 22:43:20.310482 13649 solver.cpp:204]     Train net output #1: l2_error = 1157.63 (* 1 = 1157.63 loss)
I0626 22:43:20.310499 13649 solver.cpp:464] Iteration 100, lr = 0.00992565
I0626 22:44:23.551571 13649 solver.cpp:189] Iteration 200, loss = 7569.84
I0626 22:44:23.551831 13649 solver.cpp:204]     Train net output #0: cross_entropy_loss = 6412.21 (* 1 = 6412.21 loss)
I0626 22:44:23.551865 13649 solver.cpp:204]     Train net output #1: l2_error = 1157.63 (* 1 = 1157.63 loss)
I0626 22:44:23.551882 13649 solver.cpp:464] Iteration 200, lr = 0.00985258

@jyegerlehner
Copy link
Contributor

@Tgaaly No, I haven't encountered a problem like that.

One thing that looks suspicious is that l2 error is so large, when the sigmoids can only give you a response from [0,1], or [-1,1], forget which. As if the data hasn't been scaled to a range that sigmoids can produce.

By the way in my particular case I'm using PReLUs not sigmoids. You might try the way-overcomplete approach with lots of deconv output channels, followed by perhaps PReLUs and then 1x1 convs to bring the number of channels down to what you need (1 channel ?). Those are just a few random thoughts. Hard to say more without knowing more about the problem you are solving and what you are doing.

@engr3os
Copy link

engr3os commented Nov 9, 2015

@Tgaaly You don't have to make flatdata layer since you are dealing with colored images (I suppose). You can set the group parameter of convolution parameters to 3 in both conv and deconv layers. This should take care of your problem. You can also use dropout layer if you want to achieve dropout auto encoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.