You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the ViT used in the ViTDet project doesn't have a cls_token and the SimpleFeaturePyramid doesn't support cls_tokens as it expects patch-level features of shape (batch, hidden_dim, patch_size, patch_size). If I wanted to try a pretrained model like dinov2 which has a cls_token, what would be the correct way to do so?
Ignore the cls_token from the start, that is, drop it after the pos embedding and before the transformer blocks.
Ignore the cls_token after the transformer blocks and before the final layer norm.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Currently, the ViT used in the ViTDet project doesn't have a
cls_token
and theSimpleFeaturePyramid
doesn't supportcls_tokens
as it expects patch-level features of shape(batch, hidden_dim, patch_size, patch_size)
. If I wanted to try a pretrained model like dinov2 which has acls_token
, what would be the correct way to do so?cls_token
from the start, that is, drop it after the pos embedding and before the transformer blocks.cls_token
after the transformer blocks and before the final layer norm.Thanks :^)
Beta Was this translation helpful? Give feedback.
All reactions