BasicVSR structure why not using a encoder-decoder framework? #770
-
Hi, In BasicVSR, it seems that you use 30 ResidualBlocksWithInputConv to encode features that comes from the warped feature maps. Wouldn't be more convenient to encode this information in an encoder-decoder fashion? What are your motivations to keep a constant number of channels in 30 ResidualBlocksWithInputConv? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
By "encoder-decoder" fashion do you mean gradually downsample with increased channel, followed by an upsampling? In the task of super-resolution, we aim at restoring the missing details. We usually will not further downsample the image/feature as it is already low-resolution. While increasing the channel and decreasing the spatial resolution is conceptually similar, we empirically found that such an encoder-decoder design does not improve the performance. In addition, it incurs a huge number of parameters. Therefore, a common practice is that we stay at the input resolution, and upsample the feature at the end of the network. Nevertheless, I will not rule out the possibility that such a design could work, with a more sophisticated components :) |
Beta Was this translation helpful? Give feedback.
By "encoder-decoder" fashion do you mean gradually downsample with increased channel, followed by an upsampling?
In the task of super-resolution, we aim at restoring the missing details. We usually will not further downsample the image/feature as it is already low-resolution. While increasing the channel and decreasing the spatial resolution is conceptually similar, we empirically found that such an encoder-decoder design does not improve the performance. In addition, it incurs a huge number of parameters. Therefore, a common practice is that we stay at the input resolution, and upsample the feature at the end of the network.
Nevertheless, I will not rule out the possibility that such a d…