Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Timm ViTs #776

Closed
wants to merge 2 commits into from
Closed

Support Timm ViTs #776

wants to merge 2 commits into from

Conversation

isaaccorley
Copy link

This PR implements a TimmUniversalViTEncoder loosely based on Benchmarking Detection Transfer Learning with Vision Transformers. This is essentially a naive method to upsample features from intermediate transformer encoder blocks to allow for connecting with any of the SMP decoders using timm's VisionTransformer.get_intermediate_layers method.

I'm not sold that the upsampling is the right way to support ViT encoders so I'm definitely open to improving this PR.

@adamjstewart @calebrob6

@qubvel
Copy link
Collaborator

qubvel commented Jun 4, 2023

Hi, thanks a lot for the PR!
Can it be more universal and support other transformer architectures from Timm?

@isaaccorley
Copy link
Author

Right now this relies on the timm.models.VisionTransformer.get_intermediate_layers method here. So any other transformer architecture that inherits from VisionTransformer should be supported. I see some other transformer architectures that are more custom like crossvit and davit that are plain nn.Modules so they wouldn't be supported.

**kwargs,
)
if name.startswith("vit"):
depth = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this variable is used?

pretrained=weights is not None,
**kwargs,
)
if name.startswith("vit"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please go through other archs except vit in timm and check whether they are supported?
if some other archs are supported we probably should change the logic here

@adamjstewart
Copy link
Collaborator

Ping @isaaccorley, would love to see this completed

@github-actions
Copy link

github-actions bot commented Oct 3, 2023

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Oct 3, 2023
@github-actions
Copy link

This PR was closed because it has been stalled for 15 days with no activity.

@github-actions github-actions bot closed this Oct 18, 2023
@isaaccorley
Copy link
Author

I can't seem to open this PR back up myself but I would like to revisit this.

So I've first done a search of all the model classes in timm which have a get_intermediate_layers method. This returns 109/1019 ~ 10% of all models which support this.

[
'deit3_base_patch16_224',
 'deit3_base_patch16_384',
 'deit3_huge_patch14_224',
 'deit3_large_patch16_224',
 'deit3_large_patch16_384',
 'deit3_medium_patch16_224',
 'deit3_small_patch16_224',
 'deit3_small_patch16_384',
 'deit_base_distilled_patch16_224',
 'deit_base_distilled_patch16_384',
 'deit_base_patch16_224',
 'deit_base_patch16_384',
 'deit_small_distilled_patch16_224',
 'deit_small_patch16_224',
 'deit_tiny_distilled_patch16_224',
 'deit_tiny_patch16_224',
 'eva_large_patch14_196',
 'eva_large_patch14_336',
 'flexivit_base',
 'flexivit_large',
 'flexivit_small',
 'vit_base_patch8_224',
 'vit_base_patch14_dinov2',
 'vit_base_patch14_reg4_dinov2',
 'vit_base_patch16_18x2_224',
 'vit_base_patch16_224',
 'vit_base_patch16_224_miil',
 'vit_base_patch16_384',
 'vit_base_patch16_clip_224',
 'vit_base_patch16_clip_384',
 'vit_base_patch16_clip_quickgelu_224',
 'vit_base_patch16_gap_224',
 'vit_base_patch16_plus_240',
 'vit_base_patch16_reg8_gap_256',
 'vit_base_patch16_rpn_224',
 'vit_base_patch16_siglip_224',
 'vit_base_patch16_siglip_256',
 'vit_base_patch16_siglip_384',
 'vit_base_patch16_siglip_512',
 'vit_base_patch16_xp_224',
 'vit_base_patch32_224',
 'vit_base_patch32_384',
 'vit_base_patch32_clip_224',
 'vit_base_patch32_clip_256',
 'vit_base_patch32_clip_384',
 'vit_base_patch32_clip_448',
 'vit_base_patch32_clip_quickgelu_224',
 'vit_base_patch32_plus_256',
 'vit_base_r26_s32_224',
 'vit_base_r50_s16_224',
 'vit_base_r50_s16_384',
 'vit_base_resnet26d_224',
 'vit_base_resnet50d_224',
 'vit_giant_patch14_224',
 'vit_giant_patch14_clip_224',
 'vit_giant_patch14_dinov2',
 'vit_giant_patch14_reg4_dinov2',
 'vit_giant_patch16_gap_224',
 'vit_gigantic_patch14_224',
 'vit_gigantic_patch14_clip_224',
 'vit_huge_patch14_224',
 'vit_huge_patch14_clip_224',
 'vit_huge_patch14_clip_336',
 'vit_huge_patch14_clip_378',
 'vit_huge_patch14_clip_quickgelu_224',
 'vit_huge_patch14_clip_quickgelu_378',
 'vit_huge_patch14_gap_224',
 'vit_huge_patch14_xp_224',
 'vit_huge_patch16_gap_448',
 'vit_large_patch14_224',
 'vit_large_patch14_clip_224',
 'vit_large_patch14_clip_336',
 'vit_large_patch14_clip_quickgelu_224',
 'vit_large_patch14_clip_quickgelu_336',
 'vit_large_patch14_dinov2',
 'vit_large_patch14_reg4_dinov2',
 'vit_large_patch14_xp_224',
 'vit_large_patch16_224',
 'vit_large_patch16_384',
 'vit_large_patch16_siglip_256',
 'vit_large_patch16_siglip_384',
 'vit_large_patch32_224',
 'vit_large_patch32_384',
 'vit_large_r50_s32_224',
 'vit_large_r50_s32_384',
 'vit_medium_patch16_gap_240',
 'vit_medium_patch16_gap_256',
 'vit_medium_patch16_gap_384',
 'vit_medium_patch16_reg4_256',
 'vit_medium_patch16_reg4_gap_256',
 'vit_small_patch8_224',
 'vit_small_patch14_dinov2',
 'vit_small_patch14_reg4_dinov2',
 'vit_small_patch16_18x2_224',
 'vit_small_patch16_36x1_224',
 'vit_small_patch16_224',
 'vit_small_patch16_384',
 'vit_small_patch32_224',
 'vit_small_patch32_384',
 'vit_small_r26_s32_224',
 'vit_small_r26_s32_384',
 'vit_small_resnet26d_224',
 'vit_small_resnet50d_s16_224',
 'vit_so400m_patch14_siglip_224',
 'vit_so400m_patch14_siglip_384',
 'vit_tiny_patch16_224',
 'vit_tiny_patch16_384',
 'vit_tiny_r_s16_p8_224',
 'vit_tiny_r_s16_p8_384'
]

@csaroff
Copy link

csaroff commented Nov 17, 2023

@qubvel is it possible to get this reopened?

@isaaccorley
Copy link
Author

@csaroff I made a fork of the repo that I'll be adding this support too. It can be installed using pip install torchseg https://github.com/isaaccorley/torchseg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants