|
| 1 | +# SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +<!-- [ALGORITHM] --> |
| 6 | + |
| 7 | +```latex |
| 8 | +@article{xie2021segformer, |
| 9 | + title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers}, |
| 10 | + author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping}, |
| 11 | + journal={arXiv preprint arXiv:2105.15203}, |
| 12 | + year={2021} |
| 13 | +} |
| 14 | +``` |
| 15 | + |
| 16 | +## Results and models |
| 17 | + |
| 18 | +### ADE20k |
| 19 | + |
| 20 | +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download | |
| 21 | +| ------ | -------- | --------- | ------: | -------: | -------------- | ---: | ------------- | ------ | -------- | |
| 22 | +|Segformer | MIT-B0 | 512x512 | 160000 | 2.1 | 51.32 | 37.41 | 38.34 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b0_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b0_512x512_160k_ade20k/segformer_mit-b0_512x512_160k_ade20k_20210726_101530-8ffa8fda.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b0_512x512_160k_ade20k/segformer_mit-b0_512x512_160k_ade20k_20210726_101530.log.json) | |
| 23 | +|Segformer | MIT-B1 | 512x512 | 160000 | 2.6 | 47.66 | 40.97 | 42.54 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b1_512x512_160k_ade20k/segformer_mit-b1_512x512_160k_ade20k_20210726_112106-d70e859d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b1_512x512_160k_ade20k/segformer_mit-b1_512x512_160k_ade20k_20210726_112106.log.json) | |
| 24 | +|Segformer | MIT-B2 | 512x512 | 160000 | 3.6 | 30.88 | 45.58 | 47.03 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b2_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b2_512x512_160k_ade20k/segformer_mit-b2_512x512_160k_ade20k_20210726_112103-cbd414ac.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b2_512x512_160k_ade20k/segformer_mit-b2_512x512_160k_ade20k_20210726_112103.log.json) | |
| 25 | +|Segformer | MIT-B3 | 512x512 | 160000 | 4.8 | 22.11 | 47.82 | 48.81 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b3_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b3_512x512_160k_ade20k/segformer_mit-b3_512x512_160k_ade20k_20210726_081410-962b98d2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b3_512x512_160k_ade20k/segformer_mit-b3_512x512_160k_ade20k_20210726_081410.log.json) | |
| 26 | +|Segformer | MIT-B4 | 512x512 | 160000 | 6.1 | 15.45 | 48.46 | 49.76 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b4_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b4_512x512_160k_ade20k/segformer_mit-b4_512x512_160k_ade20k_20210728_183055-7f509d7d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b4_512x512_160k_ade20k/segformer_mit-b4_512x512_160k_ade20k_20210728_183055.log.json) | |
| 27 | +|Segformer | MIT-B5 | 512x512 | 160000 | 7.2 | 11.89 | 49.13 | 50.22 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b5_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_512x512_160k_ade20k/segformer_mit-b5_512x512_160k_ade20k_20210726_145235-94cedf59.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_512x512_160k_ade20k/segformer_mit-b5_512x512_160k_ade20k_20210726_145235.log.json) | |
| 28 | +|Segformer | MIT-B5 | 640x640 | 160000 | 11.5 | 11.30 | 49.62 | 50.36 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b5_640x640_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_640x640_160k_ade20k/segformer_mit-b5_640x640_160k_ade20k_20210801_121243-41d2845b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_640x640_160k_ade20k/segformer_mit-b5_640x640_160k_ade20k_20210801_121243.log.json) | |
| 29 | + |
| 30 | +Evaluation with AlignedResize: |
| 31 | + |
| 32 | +| Method | Backbone | Crop Size | Lr schd | mIoU | mIoU(ms+flip) | |
| 33 | +| ------ | -------- | --------- | ------: | ---: | ------------- | |
| 34 | +|Segformer | MIT-B0 | 512x512 | 160000 | 38.1 | 38.57 | |
| 35 | +|Segformer | MIT-B1 | 512x512 | 160000 | 41.64 | 42.76 | |
| 36 | +|Segformer | MIT-B2 | 512x512 | 160000 | 46.53 | 47.49 | |
| 37 | +|Segformer | MIT-B3 | 512x512 | 160000 | 48.46 | 49.14 | |
| 38 | +|Segformer | MIT-B4 | 512x512 | 160000 | 49.34 | 50.29 | |
| 39 | +|Segformer | MIT-B5 | 512x512 | 160000 | 50.08 | 50.72 | |
| 40 | +|Segformer | MIT-B5 | 640x640 | 160000 | 50.58 | 50.8 | |
| 41 | + |
| 42 | +We replace `AlignedResize` in original implementatiuon to `Resize + ResizeToMultiple`. If you want to test by |
| 43 | +using `AlignedResize`, you can change the dataset pipeline like this: |
| 44 | + |
| 45 | +```python |
| 46 | +test_pipeline = [ |
| 47 | + dict(type='LoadImageFromFile'), |
| 48 | + dict( |
| 49 | + type='MultiScaleFlipAug', |
| 50 | + img_scale=(2048, 512), |
| 51 | + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], |
| 52 | + flip=False, |
| 53 | + transforms=[ |
| 54 | + dict(type='Resize', keep_ratio=True), |
| 55 | + # resize image to multiple of 32, improve SegFormer by 0.5-1.0 mIoU. |
| 56 | + dict(type='ResizeToMultiple', size_divisor=32), |
| 57 | + dict(type='RandomFlip'), |
| 58 | + dict(type='Normalize', **img_norm_cfg), |
| 59 | + dict(type='ImageToTensor', keys=['img']), |
| 60 | + dict(type='Collect', keys=['img']), |
| 61 | + ]) |
| 62 | +] |
| 63 | +``` |
| 64 | + |
| 65 | +## How to use segformer official pretrain weights |
| 66 | + |
| 67 | +We convert the backbone weights from the official repo (https://github.com/NVlabs/SegFormer) with `tools/model_converters/mit_convert.py`. |
| 68 | + |
| 69 | +You may follow below steps to start segformer training preparation: |
| 70 | + |
| 71 | +1. Download segformer pretrain weights (Suggest put in `pretrain/`); |
| 72 | +2. Run convert script to convert official pretrain weights: `python tools/model_converters/mit_convert.py pretrain/mit_b0.pth pretrain/mit_b0.pth`; |
| 73 | +3. Modify `pretrained` of segformer model config, for example, `pretrained` of `segformer_mit-b0_512x512_160k_ade20k.py` is set to `pretrain/mit_b0.pth`; |
0 commit comments