-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Timm ViTs #776
Support Timm ViTs #776
Conversation
Hi, thanks a lot for the PR! |
Right now this relies on the |
**kwargs, | ||
) | ||
if name.startswith("vit"): | ||
depth = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how this variable is used?
pretrained=weights is not None, | ||
**kwargs, | ||
) | ||
if name.startswith("vit"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please go through other archs except vit in timm and check whether they are supported?
if some other archs are supported we probably should change the logic here
Ping @isaaccorley, would love to see this completed |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days. |
This PR was closed because it has been stalled for 15 days with no activity. |
I can't seem to open this PR back up myself but I would like to revisit this. So I've first done a search of all the model classes in timm which have a [
'deit3_base_patch16_224',
'deit3_base_patch16_384',
'deit3_huge_patch14_224',
'deit3_large_patch16_224',
'deit3_large_patch16_384',
'deit3_medium_patch16_224',
'deit3_small_patch16_224',
'deit3_small_patch16_384',
'deit_base_distilled_patch16_224',
'deit_base_distilled_patch16_384',
'deit_base_patch16_224',
'deit_base_patch16_384',
'deit_small_distilled_patch16_224',
'deit_small_patch16_224',
'deit_tiny_distilled_patch16_224',
'deit_tiny_patch16_224',
'eva_large_patch14_196',
'eva_large_patch14_336',
'flexivit_base',
'flexivit_large',
'flexivit_small',
'vit_base_patch8_224',
'vit_base_patch14_dinov2',
'vit_base_patch14_reg4_dinov2',
'vit_base_patch16_18x2_224',
'vit_base_patch16_224',
'vit_base_patch16_224_miil',
'vit_base_patch16_384',
'vit_base_patch16_clip_224',
'vit_base_patch16_clip_384',
'vit_base_patch16_clip_quickgelu_224',
'vit_base_patch16_gap_224',
'vit_base_patch16_plus_240',
'vit_base_patch16_reg8_gap_256',
'vit_base_patch16_rpn_224',
'vit_base_patch16_siglip_224',
'vit_base_patch16_siglip_256',
'vit_base_patch16_siglip_384',
'vit_base_patch16_siglip_512',
'vit_base_patch16_xp_224',
'vit_base_patch32_224',
'vit_base_patch32_384',
'vit_base_patch32_clip_224',
'vit_base_patch32_clip_256',
'vit_base_patch32_clip_384',
'vit_base_patch32_clip_448',
'vit_base_patch32_clip_quickgelu_224',
'vit_base_patch32_plus_256',
'vit_base_r26_s32_224',
'vit_base_r50_s16_224',
'vit_base_r50_s16_384',
'vit_base_resnet26d_224',
'vit_base_resnet50d_224',
'vit_giant_patch14_224',
'vit_giant_patch14_clip_224',
'vit_giant_patch14_dinov2',
'vit_giant_patch14_reg4_dinov2',
'vit_giant_patch16_gap_224',
'vit_gigantic_patch14_224',
'vit_gigantic_patch14_clip_224',
'vit_huge_patch14_224',
'vit_huge_patch14_clip_224',
'vit_huge_patch14_clip_336',
'vit_huge_patch14_clip_378',
'vit_huge_patch14_clip_quickgelu_224',
'vit_huge_patch14_clip_quickgelu_378',
'vit_huge_patch14_gap_224',
'vit_huge_patch14_xp_224',
'vit_huge_patch16_gap_448',
'vit_large_patch14_224',
'vit_large_patch14_clip_224',
'vit_large_patch14_clip_336',
'vit_large_patch14_clip_quickgelu_224',
'vit_large_patch14_clip_quickgelu_336',
'vit_large_patch14_dinov2',
'vit_large_patch14_reg4_dinov2',
'vit_large_patch14_xp_224',
'vit_large_patch16_224',
'vit_large_patch16_384',
'vit_large_patch16_siglip_256',
'vit_large_patch16_siglip_384',
'vit_large_patch32_224',
'vit_large_patch32_384',
'vit_large_r50_s32_224',
'vit_large_r50_s32_384',
'vit_medium_patch16_gap_240',
'vit_medium_patch16_gap_256',
'vit_medium_patch16_gap_384',
'vit_medium_patch16_reg4_256',
'vit_medium_patch16_reg4_gap_256',
'vit_small_patch8_224',
'vit_small_patch14_dinov2',
'vit_small_patch14_reg4_dinov2',
'vit_small_patch16_18x2_224',
'vit_small_patch16_36x1_224',
'vit_small_patch16_224',
'vit_small_patch16_384',
'vit_small_patch32_224',
'vit_small_patch32_384',
'vit_small_r26_s32_224',
'vit_small_r26_s32_384',
'vit_small_resnet26d_224',
'vit_small_resnet50d_s16_224',
'vit_so400m_patch14_siglip_224',
'vit_so400m_patch14_siglip_384',
'vit_tiny_patch16_224',
'vit_tiny_patch16_384',
'vit_tiny_r_s16_p8_224',
'vit_tiny_r_s16_p8_384'
] |
@qubvel is it possible to get this reopened? |
@csaroff I made a fork of the repo that I'll be adding this support too. It can be installed using |
This PR implements a
TimmUniversalViTEncoder
loosely based on Benchmarking Detection Transfer Learning with Vision Transformers. This is essentially a naive method to upsample features from intermediate transformer encoder blocks to allow for connecting with any of the SMP decoders using timm'sVisionTransformer.get_intermediate_layers
method.I'm not sold that the upsampling is the right way to support ViT encoders so I'm definitely open to improving this PR.
@adamjstewart @calebrob6