This repo is a PyTorch implementation of applying VAN (Visual Attention Network) to semantic segmentation. The code is based on mmsegmentaion.
More details can be found in Visual Attention Network.
@article{guo2022visual,
title={Visual Attention Network},
author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min},
journal={arXiv preprint arXiv:2202.09741},
year={2022}
}
Notes: Pre-trained models can be found in TsingHua Cloud.
Method | Backbone | Pretrained | Iters | mIoU(ms) | Params | FLOPs | Config | Download |
---|---|---|---|---|---|---|---|---|
UperNet | VAN-B0 | IN-1K | 160K | 41.1 | 32M | - | config | - |
UperNet | VAN-B1 | IN-1K | 160K | 44.9 | 44M | - | config | - |
UperNet | VAN-B2 | IN-1K | 160K | 50.1 | 57M | 948G | config | TsingHua Cloud |
UperNet | VAN-B3 | IN-1K | 160K | 50.6 | 75M | 1030G | config | TsingHua Cloud |
UperNet | VAN-B4 | IN-1K | 160K | 52.2 | 90M | 1098G | config | TsingHua Cloud |
UperNet | VAN-B4 | IN-22K | 160K | 53.5 | 90M | 1098G | config | TsingHua Cloud |
UperNet | VAN-B5 | IN-22K | 160K | 53.9 | 117M | 1208G | config | TsingHua Cloud |
UperNet | VAN-B6 | IN-22K | 160K | 54.7 | 231M | 1658G | config | TsingHua Cloud |
Notes: In this scheme, we use multi-scale validation following Swin-Transformer. FLOPs are tested under the input size of 2048
Backbone | Iters | mIoU | Config | Download |
---|---|---|---|---|
VAN-Tiny | 40K | 38.5 | config | Google Drive |
VAN-Small | 40K | 42.9 | config | Google Drive |
VAN-Base | 40K | 46.7 | config | Google Drive |
VAN-Large | 40K | 48.1 | config | Google Drive |
Install MMSegmentation and download ADE20K according to the guidelines in MMSegmentation.
pip install mmsegmentation==0.26.0 (https://github.com/open-mmlab/mmsegmentation/tree/v0.26.0)
We use 8 GPUs for training by default. Run:
./dist_train.sh /path/to/config 8
To evaluate the model, run:
./dist_test.sh /path/to/config /path/to/checkpoint_file 8 --eval mIoU
Install torchprofile using
pip install torchprofile
To calculate FLOPs for a model, run:
bash tools/flops.sh /path/to/config --shape 512 512
Our implementation is mainly based on mmsegmentaion, Swin-Transformer, PoolFormer, and Enjoy-Hamburger. Thanks for their authors.
This repo is under the Apache-2.0 license. For commercial use, please contact the authors.