You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I also use the feature processing of cnn and transformer in my network. Is it also possible to do fusion with your deform_fusion module? If possible, you set "self.conv_offset = nn.Conv2d(in_channels, 233, 3, 1, 1)" in deform_fusion, I don't quite understand the meaning of 233. There are also input and output channels as "in_channels=7685, cnn_channels=2563, out_channels=256*3", why are you multiplying by 5 and 3 respectively.
The text was updated successfully, but these errors were encountered:
While in deform_fusion, self.conv_offset is used to calculate the offset for the deformable convolution. For each pixel in the input feature map, we calculate 2*3*3 offsets, in which 2 means the x and y offsets, 3*3 mean the size of the convolution kernel. In short, the output channels mean the x and y offset for a 3*3 convolution kernel. For more details, please refer to the link: https://pytorch.org/vision/main/generated/torchvision.ops.deform_conv2d.html. And for the multiply parameters, it is because we extracted features from three layers of CNN and five layers of Transformer.
Hello, I also use the feature processing of cnn and transformer in my network. Is it also possible to do fusion with your deform_fusion module? If possible, you set "self.conv_offset = nn.Conv2d(in_channels, 233, 3, 1, 1)" in deform_fusion, I don't quite understand the meaning of 233. There are also input and output channels as "in_channels=7685, cnn_channels=2563, out_channels=256*3", why are you multiplying by 5 and 3 respectively.
The text was updated successfully, but these errors were encountered: