Skip to content

Latest commit

 

History

History
113 lines (86 loc) · 8.76 KB

EVAL.md

File metadata and controls

113 lines (86 loc) · 8.76 KB

Pre-trained Models & Evaluation & Fine-tuning

Here we provide the pre-trained models and the evaluation/fine-tuning instructions.

ImageNet-1K trained models

These models are also available at Tsinghua Cloud.

Model #Param #FLOPs Acc@1 Training Speedup #Equivalent Epochs link
ResNet-50 26M 4.1G 79.7% ~1.5x 200 Google Drive
ConvNeXt-Tiny 29M 4.5G 82.2% ~1.5x 200 Google Drive
ConvNeXt-Small 50M 8.7G 83.2% ~1.5x 200 Google Drive
ConvNeXt-Base 89M 15.4G 83.8% ~1.5x 200 Google Drive
DeiT-Tiny 5M 1.3G 72.5% ~3.0x 100 Google Drive
73.4% ~2.0x 150 Google Drive
73.8% ~1.5x 200 Google Drive
74.4% ~1.0x 300 Google Drive
DeiT-Small 22M 4.6G 79.9% ~3.0x 100 Google Drive
80.6% ~2.0x 150 Google Drive
81.0% ~1.5x 200 Google Drive
81.4% ~1.0x 300 Google Drive
Swin-Tiny 28M 4.5G 80.9% ~3.0x 100 Google Drive
81.4% ~2.0x 150 Google Drive
81.6% ~1.5x 200 Google Drive
Swin-Small 50M 8.7G 82.8% ~3.0x 100 Google Drive
83.1% ~2.0x 150 Google Drive
83.2% ~1.5x 200 Google Drive
Swin-Base 88M 15.4G 83.3% ~3.0x 100 Google Drive
83.5% ~2.0x 150 Google Drive
83.6% ~1.5x 200 Google Drive
CSWin-Tiny 23M 4.3G 82.9% ~1.5x 200 Google Drive
CSWin-Small 35M 6.9G 83.6% ~1.5x 200 Google Drive
CSWin-Base 78M 15.0G 84.3% ~1.5x 200 Google Drive
CAFormer-S18 26M 4.1G 83.4% ~1.5x 200 Google Drive
CAFormer-S36 39M 8.0G 84.3% ~1.5x 200 Google Drive
CAFormer-M36 56M 13.2G 85.0% ~1.5x 200 Google Drive

ImageNet-22K -> ImageNet-1K fine-tuned models

These models are also available at Tsinghua Cloud.

Model #Param #FLOPs Acc@1 Pre-training Speedup link
CSWin-Base-224 78M 15.0G 86.1% ~3.0x Google Drive
86.3% ~2.0x Google Drive
CSWin-Base-384 78M 47.0G 87.1% ~3.0x Google Drive
87.4% ~2.0x Google Drive
CSWin-Large-224 173M 31.5G 86.9% ~3.0x Google Drive
87.1% ~2.0x Google Drive
CSWin-Large-384 173M 96.8G 87.9% ~3.0x Google Drive
88.1% ~2.0x Google Drive

Evaluation

We give an example command for evaluating Swin-Tiny:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
    python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=12345 main_buffer.py \
    --model swin_tiny --drop_path 0.0 \
    --eval true --batch_size 128 --input_size 224 \
    --data_path /path/to/imagenet-1k \
    --resume /path/to/checkpoint/ET_pp_200ep_swinT.pth

This should yield

* Acc@1 81.626 Acc@5 95.694 loss 0.785
  • For other models, please change --model, --resume, and --input_size accordingly. You can get the pre-trained models from the tables above.
  • Setting a model-specific --drop_path is not required in evaluation, as the DropPath module in timm behaves the same during evaluation, but it is required in training.

ImageNet-22K pre-trained models

These models are also available at Tsinghua Cloud.

Model #Param #FLOPs Pre-training Speedup link
CSWin-Base-224 78M 15.0G ~3.0x Google Drive
15.0G ~2.0x Google Drive
CSWin-Large-224 173M 31.5G ~3.0x Google Drive
31.5G ~2.0x Google Drive

Fine-tuning ImageNet-22K pre-trained models

We give an example command for fine-tuning an ImageNet-22K pre-trained CSWin-Base-224 model on ImageNet-1K:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
    python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=12345 main_buffer.py \
    --model CSWin_96_24322_base_224 --drop_path 0.2 --weight_decay 1e-8 \
    --batch_size 64 --lr 5e-5 --update_freq 1 \
    --warmup_epochs 0 --epochs 30 --end_epoch 30 \
    --cutmix 0 --mixup 0 --layer_decay 0.9 --input_size 224 \
    --use_amp true \
    --model_ema true --model_ema_eval true --model_ema_decay 0.9998 \
    --data_path /path/to/imagenet-1k \
    --output_dir /path/to/save/results \
    --finetune /path/to/checkpoint/ET_pp_in22k_pre_trained_speedup2x_cswinB.pth
  • For other models, please change --model, --finetune, and --input_size accordingly. You can get the pre-trained models from the table above.
  • For better performance, --drop_path, --layer_decay, and --model_ema_decay can be adjusted. In our paper, we determine these hyper-parameters on top of the baseline models, and directly use these obtained configurations for fine-tuning our ImageNet-22K pre-trained models.