mindspore-lab · TamirBaydasov · Mar 9, 2023 · Mar 10, 2023 · Mar 10, 2023 · Mar 10, 2023
diff --git a/configs/twins/README.md b/configs/twins/README.md
@@ -0,0 +1,92 @@
+
+# Twins
+> [Twins: Revisiting the Design of Spatial Attention in Vision Transformers](https://openreview.net/pdf?id=5kTlVBkzSRx)
+
+## Introduction
+
+Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins- PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including image- level classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks.
+
+<img width="1285" alt="twins_svt_s" src="https://user-images.githubusercontent.com/41994229/224014703-ed5ee3ed-3e82-46fb-bd34-289519095a7e.png">
+
+Twins-SVT-S Architecture (Right side shows the inside of two consecutive Transformer Encoders).
+
+## Results
+
+**Implementation and configs for training were taken and adjusted from [this repository](https://gitee.com/cvisionlab/models/tree/twins/release/research/cv/Twins), which implements Twins models in mindspore.**
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+<div align="center">
+
+| Model    | Context  | Top-1 (%) | Top-5 (%) | Params (M) | Recipe                                                                                        | Download                                                                               |
+|----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
+| svt_small | Converted from PyTorch | 81     | 95.38     | -       | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/twins/svt_s_gpu.yaml) | [weights](https://storage.googleapis.com/huawei-mindspore-hk/Twins/converted/svt_s_new.ckpt) |
+| svt_base | Converted from PyTorch | 82.63 | 96.17 | - | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/twins/svt_s_gpu.yaml) | [weights](https://storage.googleapis.com/huawei-mindspore-hk/Twins/converted/svt_b_new.ckpt) |
+| svt_large | Converted from PyTorch | 83.04 | 96.35 | - | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/twins/svt_s_gpu.yaml) | [weights](https://storage.googleapis.com/huawei-mindspore-hk/Twins/converted/svt_l_new.ckpt) |
+| pcpvt_small | Converted from Pytorch | 80.58 | 95.40 | - |[yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/twins/pcpvt_l_gpu.yaml) | [weights](https://storage.googleapis.com/huawei-mindspore-hk/Twins/converted/pcpvt_s_new.ckpt) |
+| pcpvt_base | Converted from Pytorch | 82.19 | 96.08 | - | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/twins/pcpvt_l_gpu.yaml) | [weights](https://storage.googleapis.com/huawei-mindspore-hk/Twins/converted/pcpvt_b_new.ckpt) |
+| pcpvt_large | Converted from PyTorch | 82.51 | 96.37 | - | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/twins/pcpvt_l_gpu.yaml) | [weights](https://storage.googleapis.com/huawei-mindspore-hk/Twins/converted/pcpvt_l_new.ckpt)
+
+</div>
+
+#### Notes
+
+- Context: The weights in the table were taken from [official repository](https://github.com/Meituan-AutoML/Twins) and converted to mindspore format
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+
+```shell
+# distrubted training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/twins/svt_s_gpu.yaml --data_dir /path/to/imagenet --distributed True
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:**  As the global batch size  (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/twins/svt__gpus.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/twins/svt_s_gpu.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+Paper - https://openreview.net/pdf?id=5kTlVBkzSRx
+
+Official repo - https://github.com/Meituan-AutoML/Twins
+
+Mindspore implementation - https://gitee.com/cvisionlab/models/tree/twins/release/research/cv/Twins
diff --git a/configs/twins/pcpvt_l_gpu.yaml b/configs/twins/pcpvt_l_gpu.yaml
@@ -0,0 +1,67 @@
+# system
+mode: 0
+distribute: False
+num_parallel_workers: 2
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: 'path/to/imagenet/'
+shuffle: True
+dataset_download: False
+batch_size: 16
+drop_remainder: True
+val_split: val
+train_split: val
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+interpolation: bicubic
+re_prob: 0.24
+re_value: 'random'
+cutmix: 1.0
+mixup: 0.8
+mixup_prob: 1.0
+mixup_mode: 'batch'
+mixup_off_epoch: 0.0
+switch_prob: 0.5
+crop_pct: 0.9
+
+# model
+model: 'pcpvt_large'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+ema: False
+clip_grad: True
+clip_value: 5.0
+drop_rate: 0.0
+drop_path_rate: 0.1
+
+# loss
+loss: 'CE'
+label_smoothing: 0.5
+
+# lr scheduler
+lr_scheduler: 'cosine_decay'
+warmup_epochs: 20
+lr: 0.0001
+warmup_factor: 0.001
+min_lr: 0.00001
+
+# optimizer
+opt: 'adamw'
+eps: 1e-8
+weight_decay: 0.05
+dynamic_loss_scale: True
diff --git a/configs/twins/svt_s_gpu.yaml b/configs/twins/svt_s_gpu.yaml
@@ -0,0 +1,67 @@
+# system
+mode: 0
+distribute: False
+num_parallel_workers: 2
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: 'path/to/imagenet/'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+val_split: val
+train_split: val
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+interpolation: bicubic
+re_prob: 0.24
+re_value: 'random'
+cutmix: 1.0
+mixup: 0.8
+mixup_prob: 1.0
+mixup_mode: 'batch'
+mixup_off_epoch: 0.0
+switch_prob: 0.5
+crop_pct: 0.9
+
+# model
+model: 'alt_gvt_small'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+ema: False
+clip_grad: True
+clip_value: 5.0
+drop_rate: 0.0
+drop_path_rate: 0.1
+
+# loss
+loss: 'CE'
+label_smoothing: 0.5
+
+# lr scheduler
+lr_scheduler: 'cosine_decay'
+warmup_epochs: 20
+lr: 0.0001
+warmup_factor: 0.001
+min_lr: 0.00001
+
+# optimizer
+opt: 'adamw'
+eps: 1e-8
+weight_decay: 0.05
+dynamic_loss_scale: True
diff --git a/mindcv/models/__init__.py b/mindcv/models/__init__.py
@@ -20,6 +20,7 @@
     mobilenet_v3,
     model_factory,
     nasnet,
+    pcpvt,
     pnasnet,
     poolformer,
     pvt,
@@ -36,6 +37,7 @@
     shufflenetv2,
     sknet,
     squeezenet,
+    svt,
     swin_transformer,
     vgg,
     visformer,
@@ -62,6 +64,7 @@
 from .mobilenet_v3 import *
 from .model_factory import *
 from .nasnet import *
+from .pcpvt import *
 from .pnasnet import *
 from .poolformer import *
 from .pvt import *
@@ -78,6 +81,7 @@
 from .shufflenetv2 import *
 from .sknet import *
 from .squeezenet import *
+from .svt import *
 from .swin_transformer import *
 from .utils import *
 from .vgg import *
@@ -109,6 +113,7 @@
 __all__.extend(model_factory.__all__)
 __all__.extend(["NASNetAMobile", "nasnet"])
 __all__.extend(["Pnasnet", "pnasnet"])
+__all__.extend(pcpvt.__all__)
 __all__.extend(poolformer.__all__)
 __all__.extend(pvt.__all__)
 __all__.extend(pvtv2.__all__)
@@ -124,6 +129,7 @@
 __all__.extend(shufflenetv2.__all__)
 __all__.extend(sknet.__all__)
 __all__.extend(squeezenet.__all__)
+__all__.extend(svt.__all__)
 __all__.extend(swin_transformer.__all__)
 __all__.extend(vgg.__all__)
 __all__.extend(visformer.__all__)

diff --git a/mindcv/models/layers/__init__.py b/mindcv/models/layers/__init__.py
@@ -1,9 +1,21 @@
 """layers init"""
-from . import activation, conv_norm_act, drop_path, identity, pooling, selective_kernel, squeeze_excite
+from . import (
+    activation,
+    conv_norm_act,
+    drop_path,
+    identity,
+    mlp,
+    patch_embed,
+    pooling,
+    selective_kernel,
+    squeeze_excite,
+)
 from .activation import *
 from .conv_norm_act import *
 from .drop_path import *
 from .identity import *
+from .mlp import *
+from .patch_embed import *
 from .pooling import *
 from .selective_kernel import *
 from .squeeze_excite import *