Here we provide the pre-trained models and the evaluation/fine-tuning instructions.
These models are also available at Tsinghua Cloud.
Model | #Param | #FLOPs | Acc@1 | Training Speedup | #Equivalent Epochs | link |
---|---|---|---|---|---|---|
ResNet-50 | 26M | 4.1G | 79.7% | ~1.5x | 200 | Google Drive |
ConvNeXt-Tiny | 29M | 4.5G | 82.2% | ~1.5x | 200 | Google Drive |
ConvNeXt-Small | 50M | 8.7G | 83.2% | ~1.5x | 200 | Google Drive |
ConvNeXt-Base | 89M | 15.4G | 83.8% | ~1.5x | 200 | Google Drive |
DeiT-Tiny | 5M | 1.3G | 72.5% | ~3.0x | 100 | Google Drive |
73.4% | ~2.0x | 150 | Google Drive | |||
73.8% | ~1.5x | 200 | Google Drive | |||
74.4% | ~1.0x | 300 | Google Drive | |||
DeiT-Small | 22M | 4.6G | 79.9% | ~3.0x | 100 | Google Drive |
80.6% | ~2.0x | 150 | Google Drive | |||
81.0% | ~1.5x | 200 | Google Drive | |||
81.4% | ~1.0x | 300 | Google Drive | |||
Swin-Tiny | 28M | 4.5G | 80.9% | ~3.0x | 100 | Google Drive |
81.4% | ~2.0x | 150 | Google Drive | |||
81.6% | ~1.5x | 200 | Google Drive | |||
Swin-Small | 50M | 8.7G | 82.8% | ~3.0x | 100 | Google Drive |
83.1% | ~2.0x | 150 | Google Drive | |||
83.2% | ~1.5x | 200 | Google Drive | |||
Swin-Base | 88M | 15.4G | 83.3% | ~3.0x | 100 | Google Drive |
83.5% | ~2.0x | 150 | Google Drive | |||
83.6% | ~1.5x | 200 | Google Drive | |||
CSWin-Tiny | 23M | 4.3G | 82.9% | ~1.5x | 200 | Google Drive |
CSWin-Small | 35M | 6.9G | 83.6% | ~1.5x | 200 | Google Drive |
CSWin-Base | 78M | 15.0G | 84.3% | ~1.5x | 200 | Google Drive |
CAFormer-S18 | 26M | 4.1G | 83.4% | ~1.5x | 200 | Google Drive |
CAFormer-S36 | 39M | 8.0G | 84.3% | ~1.5x | 200 | Google Drive |
CAFormer-M36 | 56M | 13.2G | 85.0% | ~1.5x | 200 | Google Drive |
These models are also available at Tsinghua Cloud.
Model | #Param | #FLOPs | Acc@1 | Pre-training Speedup | link |
---|---|---|---|---|---|
CSWin-Base-224 | 78M | 15.0G | 86.1% | ~3.0x | Google Drive |
86.3% | ~2.0x | Google Drive | |||
CSWin-Base-384 | 78M | 47.0G | 87.1% | ~3.0x | Google Drive |
87.4% | ~2.0x | Google Drive | |||
CSWin-Large-224 | 173M | 31.5G | 86.9% | ~3.0x | Google Drive |
87.1% | ~2.0x | Google Drive | |||
CSWin-Large-384 | 173M | 96.8G | 87.9% | ~3.0x | Google Drive |
88.1% | ~2.0x | Google Drive |
We give an example command for evaluating Swin-Tiny
:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=12345 main_buffer.py \
--model swin_tiny --drop_path 0.0 \
--eval true --batch_size 128 --input_size 224 \
--data_path /path/to/imagenet-1k \
--resume /path/to/checkpoint/ET_pp_200ep_swinT.pth
This should yield
* Acc@1 81.626 Acc@5 95.694 loss 0.785
- For other models, please change
--model
,--resume
, and--input_size
accordingly. You can get the pre-trained models from the tables above. - Setting a model-specific
--drop_path
is not required in evaluation, as theDropPath
module intimm
behaves the same during evaluation, but it is required in training.
These models are also available at Tsinghua Cloud.
Model | #Param | #FLOPs | Pre-training Speedup | link |
---|---|---|---|---|
CSWin-Base-224 | 78M | 15.0G | ~3.0x | Google Drive |
15.0G | ~2.0x | Google Drive | ||
CSWin-Large-224 | 173M | 31.5G | ~3.0x | Google Drive |
31.5G | ~2.0x | Google Drive |
We give an example command for fine-tuning an ImageNet-22K pre-trained CSWin-Base-224
model on ImageNet-1K:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=12345 main_buffer.py \
--model CSWin_96_24322_base_224 --drop_path 0.2 --weight_decay 1e-8 \
--batch_size 64 --lr 5e-5 --update_freq 1 \
--warmup_epochs 0 --epochs 30 --end_epoch 30 \
--cutmix 0 --mixup 0 --layer_decay 0.9 --input_size 224 \
--use_amp true \
--model_ema true --model_ema_eval true --model_ema_decay 0.9998 \
--data_path /path/to/imagenet-1k \
--output_dir /path/to/save/results \
--finetune /path/to/checkpoint/ET_pp_in22k_pre_trained_speedup2x_cswinB.pth
- For other models, please change
--model
,--finetune
, and--input_size
accordingly. You can get the pre-trained models from the table above. - For better performance,
--drop_path
,--layer_decay
, and--model_ema_decay
can be adjusted. In our paper, we determine these hyper-parameters on top of the baseline models, and directly use these obtained configurations for fine-tuning our ImageNet-22K pre-trained models.