Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conv_3 bank? #100

Open
jbohnslav opened this issue Jun 5, 2019 · 0 comments
Open

Conv_3 bank? #100

jbohnslav opened this issue Jun 5, 2019 · 0 comments

Comments

@jbohnslav
Copy link

Hi there,

Thanks for the great work. I asked this question over at DeepLabCut, but I figured I should ask it again here.

I'm not used to TensorFlow, so it could be due to my inability to read tf code, but I'm confused about the prediction layers in the model. In both the DeeperCut paper and DeepLabCut paper, the authors describe using a ResNet base followed by 2x upsampling with deconvolution layers. Then, the authors "connect the final output to the output of the conv3 bank."

In the code, features are extracted with the net_funcs imported from tf.slim: resnet_v1.resnet_v1_50 and resnet_v1.resnet_v1_101. Due to the use of atrous convolutions, and the lack of global average pooling (etc)., I think the features should be of shape (N, H/16, W/16, 2048).

They are then, I believe, passed to the following prediction layer:

def prediction_layer(cfg, input, name, num_outputs):
    with slim.arg_scope([slim.conv2d, slim.conv2d_transpose], padding='SAME',
                        activation_fn=None, normalizer_fn=None,
                        weights_regularizer=slim.l2_regularizer(cfg.weight_decay)):
        with tf.variable_scope(name):
            pred = slim.conv2d_transpose(input, num_outputs,
                                         kernel_size=[3, 3], stride=2,
                                         scope='block4')
            return pred

This just means that the output of the ResNet is passed into a deconvolution layer, without any connection to conv3. Did I miss it somewhere?

Both papers use the phrasing "connected to", so I'm not sure if it's supposed to be concatenate + conv2d, addition, and whether or not the connection happens to the upsampled features or original features. I expected to see (in pseudocode) the prediction layer be something like this:

upsampled_features = conv2d_transpose(features)
outputs = conv2d(concatenate(upsampled_features,conv3))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant