How should I fix the input size during testing? #238

klkl2164 · 2024-05-20T12:44:59Z

I have modified the backbone of Mask2Former to Vmamba, which requires the input size of my model to be fixed, for example, 640x640. This is not an issue during training because the train_dataloader outputs cropped images, and I just need to specify the specific crop parameters. However, I encountered a problem during testing. I am not sure how the test_dataloader operates exactly (I am not very familiar with the detectron2 framework and couldn't find the specific code location). During testing, the width and height of the images are not equal, with one of them being 640. My question is, which part of the code should I modify to ensure that the input images to the model are 640x640 during testing? I don't need any other data augmentation methods. I would greatly appreciate it if someone could provide an answer.

zhengyuan-xie · 2024-05-21T16:44:28Z

Same question. I resize the images in the forward function during the inference period, but it is not elegant :(

klkl2164 · 2024-05-22T04:16:08Z

Same question. I resize the images in the forward function during the inference period, but it is not elegant :(

I use HUST's ViM as the backbonehttps://github.com/hustvl/Vim/blob/main/vim/models_mamba.py, in which PatchEmbed specifies the input size. I followed the Swin Transformer and added a padding operation, so non-fixed inputs can be used. Fortunately, both ViM and Mask2Former's pixel decoder do not have many requirements for input size. You can try modifying PatchEmbed in this way.
'''
class PatchEmbedfromswintransformer(nn.Module):

def __init__(self, img_size=224, patch_size=16, stride=16, in_chans=3, embed_dim=768, norm_layer=None, flatten=True):
    super().__init__()
    img_size = to_2tuple(img_size)
    patch_size = to_2tuple(patch_size)
    self.img_size = img_size
    self.patch_size = patch_size
    self.grid_size = ((img_size[0] - patch_size[0]) // stride + 1, (img_size[1] - patch_size[1]) // stride + 1)
    self.num_patches = self.grid_size[0] * self.grid_size[1]
    self.flatten = flatten

    self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride)
    self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()

def forward(self, x):
    """Forward function."""
    # padding
    _, _, H, W = x.size()
    if W % self.patch_size[1] != 0:
        x = F.pad(x, (0, self.patch_size[1] - W % self.patch_size[1]))
    if H % self.patch_size[0] != 0:
        x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))

    x = self.proj(x)  # B C Wh Ww


    if self.flatten:
        x = x.flatten(2).transpose(1, 2)  # BCHW -> BNC
    x = self.norm(x)

    return x

'''

zhengyuan-xie · 2024-05-22T05:52:33Z

Same question. I resize the images in the forward function during the inference period, but it is not elegant :(

I use HUST's ViM as the backbonehttps://github.com/hustvl/Vim/blob/main/vim/models_mamba.py, in which PatchEmbed specifies the input size. I followed the Swin Transformer and added a padding operation, so non-fixed inputs can be used. Fortunately, both ViM and Mask2Former's pixel decoder do not have many requirements for input size. You can try modifying PatchEmbed in this way. ''' class PatchEmbedfromswintransformer(nn.Module):
def __init__(self, img_size=224, patch_size=16, stride=16, in_chans=3, embed_dim=768, norm_layer=None, flatten=True):
    super().__init__()
    img_size = to_2tuple(img_size)
    patch_size = to_2tuple(patch_size)
    self.img_size = img_size
    self.patch_size = patch_size
    self.grid_size = ((img_size[0] - patch_size[0]) // stride + 1, (img_size[1] - patch_size[1]) // stride + 1)
    self.num_patches = self.grid_size[0] * self.grid_size[1]
    self.flatten = flatten

    self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride)
    self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()

def forward(self, x):
    """Forward function."""
    # padding
    _, _, H, W = x.size()
    if W % self.patch_size[1] != 0:
        x = F.pad(x, (0, self.patch_size[1] - W % self.patch_size[1]))
    if H % self.patch_size[0] != 0:
        x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))

    x = self.proj(x)  # B C Wh Ww


    if self.flatten:
        x = x.flatten(2).transpose(1, 2)  # BCHW -> BNC
    x = self.norm(x)

    return x
'''

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should I fix the input size during testing? #238

How should I fix the input size during testing? #238

klkl2164 commented May 20, 2024

zhengyuan-xie commented May 21, 2024

klkl2164 commented May 22, 2024 •

edited

Loading

zhengyuan-xie commented May 22, 2024

How should I fix the input size during testing? #238

How should I fix the input size during testing? #238

Comments

klkl2164 commented May 20, 2024

zhengyuan-xie commented May 21, 2024

klkl2164 commented May 22, 2024 • edited Loading

zhengyuan-xie commented May 22, 2024

klkl2164 commented May 22, 2024 •

edited

Loading