Support Timm ViTs #776

isaaccorley · 2023-06-02T20:32:53Z

This PR implements a TimmUniversalViTEncoder loosely based on Benchmarking Detection Transfer Learning with Vision Transformers. This is essentially a naive method to upsample features from intermediate transformer encoder blocks to allow for connecting with any of the SMP decoders using timm's VisionTransformer.get_intermediate_layers method.

I'm not sold that the upsampling is the right way to support ViT encoders so I'm definitely open to improving this PR.

@adamjstewart @calebrob6

qubvel · 2023-06-04T06:06:35Z

Hi, thanks a lot for the PR!
Can it be more universal and support other transformer architectures from Timm?

isaaccorley · 2023-06-06T14:22:39Z

Right now this relies on the timm.models.VisionTransformer.get_intermediate_layers method here. So any other transformer architecture that inherits from VisionTransformer should be supported. I see some other transformer architectures that are more custom like crossvit and davit that are plain nn.Modules so they wouldn't be supported.

qubvel · 2023-06-06T18:23:13Z

segmentation_models_pytorch/encoders/__init__.py

- **kwargs,
- )
+ if name.startswith("vit"):
+ depth = 4


how this variable is used?

qubvel · 2023-06-06T18:24:32Z

segmentation_models_pytorch/encoders/__init__.py

- pretrained=weights is not None,
- **kwargs,
- )
+ if name.startswith("vit"):


could you please go through other archs except vit in timm and check whether they are supported?
if some other archs are supported we probably should change the logic here

adamjstewart · 2023-08-03T20:33:39Z

Ping @isaaccorley, would love to see this completed

github-actions · 2023-10-03T01:49:06Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions · 2023-10-18T01:49:14Z

This PR was closed because it has been stalled for 15 days with no activity.

isaaccorley · 2023-11-04T19:58:53Z

I can't seem to open this PR back up myself but I would like to revisit this.

So I've first done a search of all the model classes in timm which have a get_intermediate_layers method. This returns 109/1019 ~ 10% of all models which support this.

[
'deit3_base_patch16_224',
 'deit3_base_patch16_384',
 'deit3_huge_patch14_224',
 'deit3_large_patch16_224',
 'deit3_large_patch16_384',
 'deit3_medium_patch16_224',
 'deit3_small_patch16_224',
 'deit3_small_patch16_384',
 'deit_base_distilled_patch16_224',
 'deit_base_distilled_patch16_384',
 'deit_base_patch16_224',
 'deit_base_patch16_384',
 'deit_small_distilled_patch16_224',
 'deit_small_patch16_224',
 'deit_tiny_distilled_patch16_224',
 'deit_tiny_patch16_224',
 'eva_large_patch14_196',
 'eva_large_patch14_336',
 'flexivit_base',
 'flexivit_large',
 'flexivit_small',
 'vit_base_patch8_224',
 'vit_base_patch14_dinov2',
 'vit_base_patch14_reg4_dinov2',
 'vit_base_patch16_18x2_224',
 'vit_base_patch16_224',
 'vit_base_patch16_224_miil',
 'vit_base_patch16_384',
 'vit_base_patch16_clip_224',
 'vit_base_patch16_clip_384',
 'vit_base_patch16_clip_quickgelu_224',
 'vit_base_patch16_gap_224',
 'vit_base_patch16_plus_240',
 'vit_base_patch16_reg8_gap_256',
 'vit_base_patch16_rpn_224',
 'vit_base_patch16_siglip_224',
 'vit_base_patch16_siglip_256',
 'vit_base_patch16_siglip_384',
 'vit_base_patch16_siglip_512',
 'vit_base_patch16_xp_224',
 'vit_base_patch32_224',
 'vit_base_patch32_384',
 'vit_base_patch32_clip_224',
 'vit_base_patch32_clip_256',
 'vit_base_patch32_clip_384',
 'vit_base_patch32_clip_448',
 'vit_base_patch32_clip_quickgelu_224',
 'vit_base_patch32_plus_256',
 'vit_base_r26_s32_224',
 'vit_base_r50_s16_224',
 'vit_base_r50_s16_384',
 'vit_base_resnet26d_224',
 'vit_base_resnet50d_224',
 'vit_giant_patch14_224',
 'vit_giant_patch14_clip_224',
 'vit_giant_patch14_dinov2',
 'vit_giant_patch14_reg4_dinov2',
 'vit_giant_patch16_gap_224',
 'vit_gigantic_patch14_224',
 'vit_gigantic_patch14_clip_224',
 'vit_huge_patch14_224',
 'vit_huge_patch14_clip_224',
 'vit_huge_patch14_clip_336',
 'vit_huge_patch14_clip_378',
 'vit_huge_patch14_clip_quickgelu_224',
 'vit_huge_patch14_clip_quickgelu_378',
 'vit_huge_patch14_gap_224',
 'vit_huge_patch14_xp_224',
 'vit_huge_patch16_gap_448',
 'vit_large_patch14_224',
 'vit_large_patch14_clip_224',
 'vit_large_patch14_clip_336',
 'vit_large_patch14_clip_quickgelu_224',
 'vit_large_patch14_clip_quickgelu_336',
 'vit_large_patch14_dinov2',
 'vit_large_patch14_reg4_dinov2',
 'vit_large_patch14_xp_224',
 'vit_large_patch16_224',
 'vit_large_patch16_384',
 'vit_large_patch16_siglip_256',
 'vit_large_patch16_siglip_384',
 'vit_large_patch32_224',
 'vit_large_patch32_384',
 'vit_large_r50_s32_224',
 'vit_large_r50_s32_384',
 'vit_medium_patch16_gap_240',
 'vit_medium_patch16_gap_256',
 'vit_medium_patch16_gap_384',
 'vit_medium_patch16_reg4_256',
 'vit_medium_patch16_reg4_gap_256',
 'vit_small_patch8_224',
 'vit_small_patch14_dinov2',
 'vit_small_patch14_reg4_dinov2',
 'vit_small_patch16_18x2_224',
 'vit_small_patch16_36x1_224',
 'vit_small_patch16_224',
 'vit_small_patch16_384',
 'vit_small_patch32_224',
 'vit_small_patch32_384',
 'vit_small_r26_s32_224',
 'vit_small_r26_s32_384',
 'vit_small_resnet26d_224',
 'vit_small_resnet50d_s16_224',
 'vit_so400m_patch14_siglip_224',
 'vit_so400m_patch14_siglip_384',
 'vit_tiny_patch16_224',
 'vit_tiny_patch16_384',
 'vit_tiny_r_s16_p8_224',
 'vit_tiny_r_s16_p8_384'
]

csaroff · 2023-11-17T11:09:57Z

@qubvel is it possible to get this reopened?

isaaccorley · 2024-01-26T20:20:39Z

@csaroff I made a fork of the repo that I'll be adding this support too. It can be installed using pip install torchseg https://github.com/isaaccorley/torchseg

support universal timm vits

040a873

qubvel mentioned this pull request Jun 4, 2023

integrate SAM (segment anything) encoder with Unet #757

Open

fix scale factors for various image and patch sizes

c011d08

qubvel requested changes Jun 6, 2023

View reviewed changes

github-actions bot added the Stale label Oct 3, 2023

github-actions bot closed this Oct 18, 2023

adamjstewart mentioned this pull request Jan 25, 2024

Is this library still maintained? #849

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Timm ViTs #776

Support Timm ViTs #776

isaaccorley commented Jun 2, 2023

qubvel commented Jun 4, 2023

isaaccorley commented Jun 6, 2023

qubvel Jun 6, 2023

qubvel Jun 6, 2023

adamjstewart commented Aug 3, 2023

github-actions bot commented Oct 3, 2023

github-actions bot commented Oct 18, 2023

isaaccorley commented Nov 4, 2023

csaroff commented Nov 17, 2023

isaaccorley commented Jan 26, 2024

Support Timm ViTs #776

Support Timm ViTs #776

Conversation

isaaccorley commented Jun 2, 2023

qubvel commented Jun 4, 2023

isaaccorley commented Jun 6, 2023

qubvel Jun 6, 2023

Choose a reason for hiding this comment

qubvel Jun 6, 2023

Choose a reason for hiding this comment

adamjstewart commented Aug 3, 2023

github-actions bot commented Oct 3, 2023

github-actions bot commented Oct 18, 2023

isaaccorley commented Nov 4, 2023

csaroff commented Nov 17, 2023

isaaccorley commented Jan 26, 2024