Releases: huggingface/pytorch-image-models
Releases · huggingface/pytorch-image-models
Release v0.9.1
The first non pre-release since Oct 2022 with a long list of changes from 0.6.x releases...
May 12, 2023
- Fix Python 3.7 import error re Final[] typing annotation
May 11, 2023
timm
0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm
- DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layers
function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
- Model creation throws error if
pretrained=True
and no weights exist (instead of continuing with random initialization)
- Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnb
prefix, ie bnbadam8bit
- Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timm
out of pre-release state
April 27, 2023
- 97% of
timm
models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
- Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps
), thanks Taeksang Kim
- More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scale
and --head-init-bias
to train.py to scale classiifer head and set fixed bias for fine-tune
- Remove all InplaceABN (
inplace_abn
) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate drop_rate into
drop_rate
(classifier dropout), proj_drop_rate
(block mlp / out projections), pos_drop_rate
(position embedding drop), attn_drop_rate
(attention dropout). Also add patch dropout (FLIP) to vit and eva models.
- fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k
- 81.7 @ 224, 82.6 @ 288
resnetaa101d.sw_in12k_ft_in1k
- 83.5 @ 224, 84.1 @ 288
seresnextaa101d_32x8d.sw_in12k_ft_in1k
- 86.0 @ 224, 86.5 @ 288
seresnextaa101d_32x8d.sw_in12k_ft_in1k_288
- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
model |
top1 |
top5 |
img_size |
param_count |
gmacs |
macts |
convnext_xxlarge.clip_laion2b_soup_ft_in1k |
88.612 |
98.704 |
256 |
846.47 |
198.09 |
124.45 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 |
88.312 |
98.578 |
384 |
200.13 |
101.11 |
126.74 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 |
87.968 |
98.47 |
320 |
200.13 |
70.21 |
88.02 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 |
87.138 |
98.212 |
384 |
88.59 |
45.21 |
84.49 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k |
86.344 |
97.97 |
256 |
88.59 |
20.09 |
37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
model |
top1 |
top5 |
param_count |
img_size |
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k |
90.054 |
99.042 |
305.08 |
448 |
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k |
89.946 |
99.01 |
305.08 |
448 |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.792 |
98.992 |
1014.45 |
560 |
eva02_large_patch14_448.mim_in22k_ft_in1k |
89.626 |
98.954 |
305.08 |
448 |
eva02_large_patch14_448.mim_m38m_ft_in1k |
89.57 |
98.918 |
305.08 |
448 |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.56 |
98.956 |
1013.01 |
336 |
eva_giant_patch14_336.clip_ft_in1k |
89.466 |
98.82 |
1013.01 |
336 |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.214 |
98.854 |
304.53 |
336 |
eva_giant_patch14_224.clip_ft_in1k |
88.882 |
98.678 |
1012.56 |
224 |
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k |
88.692 |
98.722 |
87.12 |
448 |
eva_large_patch14_336.in22k_ft_in1k |
88.652 |
98.722 |
304.53 |
336 |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.592 |
98.656 |
304.14 |
196 |
eva02_base_patch14_448.mim_in22k_ft_in1k |
88.23 |
98.564 |
87.12 |
448 |
eva_large_patch14_196.in22k_ft_in1k |
87.934 |
98.504 |
304.14 |
196 |
eva02_small_patch14_336.mim_in22k_ft_in1k |
85.74 |
97.614 |
22.13 |
336 |
eva02_tiny_patch14_336.mim_in22k_ft_in1k |
80.658 |
95.524 |
5.76 |
336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
, rexnet.py
, byobnet.py
, resnetv2.py
, swin_transformer.py
, swin_transformer_v2.py
, swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs.
- FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:
rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288
rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288
regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288
regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288
regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
- 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
and convnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune
- 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added
- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
, vit_relpos*
, coatnet
/ maxxvit
(to start)
- Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256
convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384
convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256
- `co...
Read more
Release v0.9.0
First non pre-release in a loooong while, changelog from 0.6.x below...
May 11, 2023
timm
0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm
- DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layers
function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
- Model creation throws error if
pretrained=True
and no weights exist (instead of continuing with random initialization)
- Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnb
prefix, ie bnbadam8bit
- Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timm
out of pre-release state
April 27, 2023
- 97% of
timm
models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
- Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps
), thanks Taeksang Kim
- More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scale
and --head-init-bias
to train.py to scale classiifer head and set fixed bias for fine-tune
- Remove all InplaceABN (
inplace_abn
) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate drop_rate into
drop_rate
(classifier dropout), proj_drop_rate
(block mlp / out projections), pos_drop_rate
(position embedding drop), attn_drop_rate
(attention dropout). Also add patch dropout (FLIP) to vit and eva models.
- fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k
- 81.7 @ 224, 82.6 @ 288
resnetaa101d.sw_in12k_ft_in1k
- 83.5 @ 224, 84.1 @ 288
seresnextaa101d_32x8d.sw_in12k_ft_in1k
- 86.0 @ 224, 86.5 @ 288
seresnextaa101d_32x8d.sw_in12k_ft_in1k_288
- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
model |
top1 |
top5 |
img_size |
param_count |
gmacs |
macts |
convnext_xxlarge.clip_laion2b_soup_ft_in1k |
88.612 |
98.704 |
256 |
846.47 |
198.09 |
124.45 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 |
88.312 |
98.578 |
384 |
200.13 |
101.11 |
126.74 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 |
87.968 |
98.47 |
320 |
200.13 |
70.21 |
88.02 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 |
87.138 |
98.212 |
384 |
88.59 |
45.21 |
84.49 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k |
86.344 |
97.97 |
256 |
88.59 |
20.09 |
37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
model |
top1 |
top5 |
param_count |
img_size |
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k |
90.054 |
99.042 |
305.08 |
448 |
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k |
89.946 |
99.01 |
305.08 |
448 |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.792 |
98.992 |
1014.45 |
560 |
eva02_large_patch14_448.mim_in22k_ft_in1k |
89.626 |
98.954 |
305.08 |
448 |
eva02_large_patch14_448.mim_m38m_ft_in1k |
89.57 |
98.918 |
305.08 |
448 |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.56 |
98.956 |
1013.01 |
336 |
eva_giant_patch14_336.clip_ft_in1k |
89.466 |
98.82 |
1013.01 |
336 |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.214 |
98.854 |
304.53 |
336 |
eva_giant_patch14_224.clip_ft_in1k |
88.882 |
98.678 |
1012.56 |
224 |
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k |
88.692 |
98.722 |
87.12 |
448 |
eva_large_patch14_336.in22k_ft_in1k |
88.652 |
98.722 |
304.53 |
336 |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.592 |
98.656 |
304.14 |
196 |
eva02_base_patch14_448.mim_in22k_ft_in1k |
88.23 |
98.564 |
87.12 |
448 |
eva_large_patch14_196.in22k_ft_in1k |
87.934 |
98.504 |
304.14 |
196 |
eva02_small_patch14_336.mim_in22k_ft_in1k |
85.74 |
97.614 |
22.13 |
336 |
eva02_tiny_patch14_336.mim_in22k_ft_in1k |
80.658 |
95.524 |
5.76 |
336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
, rexnet.py
, byobnet.py
, resnetv2.py
, swin_transformer.py
, swin_transformer_v2.py
, swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs.
- FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:
rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288
rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288
regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288
regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288
regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
- 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
and convnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune
- 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added
- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
, vit_relpos*
, coatnet
/ maxxvit
(to start)
- Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256
convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384
convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256
convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384
- 87.9% @ 384x384
- Add DaViT models. Supports ...
Read more
Release v0.6.13
Release from 0.6.x stable branch with fix for Python 3.11. NOTE original 0.6.13 release tag was against wrong branch.
Release v0.8.17dev0
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
, rexnet.py
, byobnet.py
, resnetv2.py
, swin_transformer.py
, swin_transformer_v2.py
, swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs.
- FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:
rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288
rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288
regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288
regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288
regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
- 0.8.15dev0
v0.8.13dev0 Release
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
and convnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune
- 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added
- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
, vit_relpos*
, coatnet
/ maxxvit
(to start)
- Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
v0.8.10dev0 Release
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256
convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384
convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256
convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384
- 87.9% @ 384x384
- Add DaViT models. Supports
features_only=True
. Adapted from https://github.com/dingmyu/davit by Fredo.
- Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
- Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- Move ImageNet meta-data (synsets, indices) from
/results
to timm/data/_info
.
- Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in
timm
- Update
inference.py
to use, try: python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
- Ready for 0.8.10 pypi pre-release (final testing).
Jan 20, 2023
-
Add two convnext 12k -> 1k fine-tunes at 384x384
convnext_tiny.in12k_ft_in1k_384
- 85.1 @ 384
convnext_small.in12k_ft_in1k_384
- 86.2 @ 384
-
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for rw
base MaxViT and CoAtNet 1/2 models
Read more
v0.8.6dev0 Release
Jan 11, 2023
- Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT
.in12k
tags)
convnext_nano.in12k_ft_in1k
- 82.3 @ 224, 82.9 @ 288 (previously released)
convnext_tiny.in12k_ft_in1k
- 84.2 @ 224, 84.5 @ 288
convnext_small.in12k_ft_in1k
- 85.2 @ 224, 85.3 @ 288
Jan 6, 2023
- Finally got around to adding
--model-kwargs
and --opt-kwargs
to scripts to pass through rare args directly to model classes from cmd line
train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
- Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
Jan 5, 2023
- ConvNeXt-V2 models and weights added to existing
convnext.py
v0.8.2dev0 Release
Part way through the conversion of models to multi-weight support (model_arch.pretrain_tag
), module reorg for future building, and lots of new weights and model additions as we go...
This is considered a development release. Please stick to 0.6.x if you need stability. Some of the model names, tags will shift a bit, some old names have already been deprecated and remapping support not added yet. For code 0.6.x branch is considered 'stable' https://github.com/rwightman/pytorch-image-models/tree/0.6.x
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
efficientnet_b5.in12k_ft_in1k
- 85.9 @ 448x448
vit_medium_patch16_gap_384.in12k_ft_in1k
- 85.5 @ 384x384
vit_medium_patch16_gap_256.in12k_ft_in1k
- 84.5 @ 256x256
convnext_nano.in12k_ft_in1k
- 82.9 @ 288x288
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py
, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
model |
top1 |
param_count |
gmac |
macts |
hub |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.2 |
304.5 |
191.1 |
270.2 |
link |
eva_large_patch14_336.in22k_ft_in1k |
88.7 |
304.5 |
191.1 |
270.2 |
link |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.6 |
304.1 |
61.6 |
63.5 |
link |
eva_large_patch14_196.in22k_ft_in1k |
87.9 |
304.1 |
61.6 |
63.5 |
link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py
.
model |
top1 |
param_count |
gmac |
macts |
hub |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.8 |
1014.4 |
1906.8 |
2577.2 |
link |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.6 |
1013 |
620.6 |
550.7 |
link |
eva_giant_patch14_336.clip_ft_in1k |
89.4 |
1013 |
620.6 |
550.7 |
link |
eva_giant_patch14_224.clip_ft_in1k |
89.1 |
1012.6 |
267.2 |
192.6 |
link |
Dec 5, 2022
- Pre-release (
0.8.0dev0
) of multi-weight support (model_arch.pretrained_tag
). Install with pip install --pre timm
- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompile
argument
- Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
model |
top1 |
param_count |
gmac |
macts |
hub |
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k |
88.6 |
632.5 |
391 |
407.5 |
link |
vit_large_patch14_clip_336.openai_ft_in12k_in1k |
88.3 |
304.5 |
191.1 |
270.2 |
link |
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k |
88.2 |
632 |
167.4 |
139.4 |
link |
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k |
88.2 |
304.5 |
191.1 |
270.2 |
link |
vit_large_patch14_clip_224.openai_ft_in12k_in1k |
88.2 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k |
87.9 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_224.openai_ft_in1k |
87.9 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_336.laion2b_ft_in1k |
87.9 |
304.5 |
191.1 |
270.2 |
link |
vit_huge_patch14_clip_224.laion2b_ft_in1k |
87.6 |
632 |
167.4 |
139.4 |
link |
vit_large_patch14_clip_224.laion2b_ft_in1k |
87.3 |
304.2 |
81.1 |
88.8 |
link |
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k |
87.2 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.openai_ft_in12k_in1k |
87 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.laion2b_ft_in1k |
86.6 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.openai_ft_in1k |
86.2 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k |
86.2 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch16_clip_224.openai_ft_in12k_in1k |
85.9 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k |
85.8 |
88.3 |
17.9 |
23.9 |
link |
vit_base_patch16_clip_224.laion2b_ft_in1k |
85.5 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k |
85.4 |
88.3 |
13.1 |
16.5 |
link |
vit_base_patch16_clip_224.openai_ft_in1k |
85.3 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_384.openai_ft_in12k_in1k |
85.2 |
88.3 |
13.1 |
16.5 |
link |
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k |
83.3 |
88.2 |
4.4 |
5 |
link |
vit_base_patch32_clip_224.laion2b_ft_in1k |
82.6 |
88.2 |
4.4 |
5 |
link |
vit_base_patch32_clip_224.openai_ft_in1k |
81.9 |
88.2 |
4.4 |
5 |
link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
Read more
v0.6.12 Release
Minor bug fixes to HF push_to_hub, plus some more MaxVit weights
Oct 10, 2022
- More weights in
maxxvit
series, incl first ConvNeXt block based coatnext
and maxxvit
experiments:
coatnext_nano_rw_224
- 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)
maxxvit_rmlp_nano_rw_256
- 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
maxvit_rmlp_small_rw_224
- 84.5 @ 224, 85.1 @ 320 (G)
maxxvit_rmlp_small_rw_256
- 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
coatnet_rmlp_2_rw_224
- 84.6 @ 224, 85 @ 320 (T)
v0.6.11 Release
Changes Since 0.6.7
Sept 23, 2022
- CLIP LAION-2B pretrained B/32, L/14, H/14, and g/14 image tower weights as vit models (for fine-tune)
Sept 7, 2022
- Hugging Face
timm
docs home now exists, look for more here in the future
- Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
- Add more weights in
maxxvit
series incl a pico
(7.5M params, 1.9 GMACs), two tiny
variants:
maxvit_rmlp_pico_rw_256
- 80.5 @ 256, 81.3 @ 320 (T)
maxvit_tiny_rw_224
- 83.5 @ 224 (G)
maxvit_rmlp_tiny_rw_256
- 84.2 @ 256, 84.8 @ 320 (T)
Aug 29, 2022
- MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
maxvit_rmlp_nano_rw_256
- 83.0 @ 256, 83.6 @ 320 (T)
Aug 26, 2022
Aug 15, 2022
- ConvNeXt atto weights added
convnext_atto
- 75.7 @ 224, 77.0 @ 288
convnext_atto_ols
- 75.9 @ 224, 77.2 @ 288
Aug 5, 2022
- More custom ConvNeXt smaller model defs with weights
convnext_femto
- 77.5 @ 224, 78.7 @ 288
convnext_femto_ols
- 77.9 @ 224, 78.9 @ 288
convnext_pico
- 79.5 @ 224, 80.4 @ 288
convnext_pico_ols
- 79.5 @ 224, 80.5 @ 288
convnext_nano_ols
- 80.9 @ 224, 81.6 @ 288
- Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
July 28, 2022
- Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks Hugo Touvron!