Using ViTDet with a pretrained model that has a `cls_token` #5326

dgcnz · 2024-07-16T12:02:28Z

dgcnz
Jul 16, 2024

Currently, the ViT used in the ViTDet project doesn't have a cls_token and the SimpleFeaturePyramid doesn't support cls_tokens as it expects patch-level features of shape (batch, hidden_dim, patch_size, patch_size). If I wanted to try a pretrained model like dinov2 which has a cls_token, what would be the correct way to do so?

Ignore the cls_token from the start, that is, drop it after the pos embedding and before the transformer blocks.
Ignore the cls_token after the transformer blocks and before the final layer norm.

Thanks :^)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ViTDet with a pretrained model that has a `cls_token` #5326

{{title}}

Replies: 0 comments

Select a reply

Using ViTDet with a pretrained model that has a cls_token #5326

dgcnz Jul 16, 2024

Replies: 0 comments

Using ViTDet with a pretrained model that has a `cls_token` #5326

dgcnz
Jul 16, 2024