This directory contains the configs and results of Swin Transformer. Most configs and results are based on the official repository.
Please consider using the mmdet's configs when you train new models.
Backbone | Pretrain | Lr schd | box AP | config | model |
---|---|---|---|---|---|
Swin-T | ImageNet-1K | 1x | 43.7 | config | github |
Backbone | Pretrain | Lr schd | box AP | mask AP | #params | FLOPs | config | log | model |
---|---|---|---|---|---|---|---|---|---|
Swin-T | ImageNet-1K | 1x | 43.7 | 39.8 | 48M | 267G | config | github/baidu | github/baidu |
Swin-T | ImageNet-1K | 3x | 46.0 | 41.6 | 48M | 267G | config | github/baidu | github/baidu |
Swin-S | ImageNet-1K | 3x | 48.5 | 43.3 | 69M | 359G | config | github/baidu | github/baidu |
Backbone | Pretrain | Lr schd | box AP | mask AP | #params | FLOPs | config | log | model |
---|---|---|---|---|---|---|---|---|---|
Swin-T | ImageNet-1K | 1x | 48.1 | 41.7 | 86M | 745G | config | github/baidu | github/baidu |
Swin-T | ImageNet-1K | 3x | 50.4 | 43.7 | 86M | 745G | config | github/baidu | github/baidu |
Swin-S | ImageNet-1K | 3x | 51.9 | 45.0 | 107M | 838G | config | github/baidu | github/baidu |
Swin-B | ImageNet-1K | 3x | 51.9 | 45.0 | 145M | 982G | config | github/baidu | github/baidu |
Notes:
- Pre-trained models can be downloaded from Swin Transformer for ImageNet Classification.
- Access code for
baidu
isswin
.
# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm
# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm
To train a detector with pre-trained models, run:
# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
For example, to train a Cascade Mask R-CNN model with a Swin-T
backbone and 8 gpus, run:
tools/dist_train.sh configs/swin_original/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL>
Note: use_checkpoint
is used to save GPU memory. Please refer to this page for more details.
The current configs use mixed precision training via MMCV by default. Please install PyTorch >= 1.6.0 to use torch.cuda.amp.
If you find performance difference from apex (used by the original authors), please raise an issue. Otherwise, we will clean code for apex.
Click me to use apex
To install apex, run:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Modify configs with the following code:
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
fp16 = None
optimizer_config = dict(
type='ApexOptimizerHook',
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
Image Classification: See Swin Transformer for Image Classification.
Semantic Segmentation: See Swin Transformer for Semantic Segmentation.