Skip to content

Commit ee47c41

Browse files
rstrudelMengzhangLIJunjun2016RockeyCossMeowZheng
authored
[Feature] Support Segmenter (open-mmlab#955)
* segmenter: add model * update * readme: update * config: update * segmenter: update readme * segmenter: update * segmenter: update * segmenter: update * configs: set checkpoint path to pretrain folder * segmenter: modify vit-s/lin, remove data config * rreadme: update * configs: transfer from _base_ to segmenter * configs: add 8x1 suffix * configs: remove redundant lines * configs: cleanup * first attempt * swipe CI error * Update mmseg/models/decode_heads/__init__.py Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn> * segmenter_linear: use fcn backbone * segmenter_mask: update * models: add segmenter vit * decoders: yapf+remove unused imports * apply precommit * segmenter/linear_head: fix * segmenter/linear_header: fix * segmenter: fix mask transformer * fix error * segmenter/mask_head: use trunc_normal init * refactor segmenter head * Fetch upstream (open-mmlab#1) * [Feature] Change options to cfg-option (open-mmlab#1129) * [Feature] Change option to cfg-option * add expire date and fix the docs * modify docstring * [Fix] Add <!-- [ABSTRACT] --> in metafile open-mmlab#1127 * [Fix] Fix correct num_classes of HRNet in LoveDA dataset open-mmlab#1136 * Bump to v0.20.1 (open-mmlab#1138) * bump version 0.20.1 * bump version 0.20.1 * [Fix] revise --option to --options open-mmlab#1140 Co-authored-by: Rockey <41846794+RockeyCoss@users.noreply.github.com> Co-authored-by: MengzhangLI <mcmong@pku.edu.cn> * decode_head: switch from linear to fcn * fix init list formatting * configs: remove variants, keep only vit-s on ade * align inference metric of vit-s-mask * configs: add vit t/b/l * Update mmseg/models/decode_heads/segmenter_mask_head.py Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * Update mmseg/models/decode_heads/segmenter_mask_head.py Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * Update mmseg/models/decode_heads/segmenter_mask_head.py Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * Update mmseg/models/decode_heads/segmenter_mask_head.py Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * Update mmseg/models/decode_heads/segmenter_mask_head.py Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> * model_converters: use torch instead of einops * setup: remove einops * segmenter_mask: fix missing imports * add necessary imported init funtion * segmenter/seg-l: set resolution to 640 * segmenter/seg-l: fix test size * fix vitjax2mmseg * add README and unittest * fix unittest * add docstring * refactor config and add pretrained link * fix typo * add paper name in readme * change segmenter config names * fix typo in readme * fix typos in readme * fix segmenter typo * fix segmenter typo * delete redundant comma in config files * delete redundant comma in config files * fix convert script * update lateset master version Co-authored-by: MengzhangLI <mcmong@pku.edu.cn> Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn> Co-authored-by: Rockey <41846794+RockeyCoss@users.noreply.github.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com>
1 parent 80e8504 commit ee47c41

16 files changed

+754
-2
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ Supported methods:
118118
- [x] [STDC (CVPR'2021)](configs/stdc)
119119
- [x] [SETR (CVPR'2021)](configs/setr)
120120
- [x] [DPT (ArXiv'2021)](configs/dpt)
121+
- [x] [Segmenter (ICCV'2021)](configs/segmenter)
121122
- [x] [SegFormer (NeurIPS'2021)](configs/segformer)
122123

123124
Supported datasets:

README_zh-CN.md

+1
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ MMSegmentation 是一个基于 PyTorch 的语义分割开源工具箱。它是 O
117117
- [x] [STDC (CVPR'2021)](configs/stdc)
118118
- [x] [SETR (CVPR'2021)](configs/setr)
119119
- [x] [DPT (ArXiv'2021)](configs/dpt)
120+
- [x] [Segmenter (ICCV'2021)](configs/segmenter)
120121
- [x] [SegFormer (NeurIPS'2021)](configs/segformer)
121122

122123
已支持的数据集:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# model settings
2+
backbone_norm_cfg = dict(type='LN', eps=1e-6, requires_grad=True)
3+
model = dict(
4+
type='EncoderDecoder',
5+
pretrained='pretrain/vit_base_p16_384.pth',
6+
backbone=dict(
7+
type='VisionTransformer',
8+
img_size=(512, 512),
9+
patch_size=16,
10+
in_channels=3,
11+
embed_dims=768,
12+
num_layers=12,
13+
num_heads=12,
14+
drop_path_rate=0.1,
15+
attn_drop_rate=0.0,
16+
drop_rate=0.0,
17+
final_norm=True,
18+
norm_cfg=backbone_norm_cfg,
19+
with_cls_token=True,
20+
interpolate_mode='bicubic',
21+
),
22+
decode_head=dict(
23+
type='SegmenterMaskTransformerHead',
24+
in_channels=768,
25+
channels=768,
26+
num_classes=150,
27+
num_layers=2,
28+
num_heads=12,
29+
embed_dims=768,
30+
dropout_ratio=0.0,
31+
loss_decode=dict(
32+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
33+
),
34+
test_cfg=dict(mode='slide', crop_size=(512, 512), stride=(480, 480)),
35+
)

configs/segmenter/README.md

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Segmenter
2+
3+
[Segmenter: Transformer for Semantic Segmentation](https://arxiv.org/abs/2105.05633)
4+
5+
## Introduction
6+
7+
<!-- [ALGORITHM] -->
8+
9+
<a href="https://github.com/rstrudel/segmenter">Official Repo</a>
10+
11+
<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.21.0/mmseg/models/decode_heads/segmenter_mask_head.py#L15">Code Snippet</a>
12+
13+
## Abstract
14+
15+
<!-- [ABSTRACT] -->
16+
17+
Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to convolution-based methods, our approach allows to model global context already at the first layer and throughout the network. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. To do so, we rely on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder. We leverage models pre-trained for image classification and show that we can fine-tune them on moderate sized datasets available for semantic segmentation. The linear decoder allows to obtain excellent results already, but the performance can be further improved by a mask transformer generating class masks. We conduct an extensive ablation study to show the impact of the different parameters, in particular the performance is better for large models and small patch sizes. Segmenter attains excellent results for semantic segmentation. It outperforms the state of the art on both ADE20K and Pascal Context datasets and is competitive on Cityscapes.
18+
19+
<!-- [IMAGE] -->
20+
<div align=center>
21+
<img src="https://user-images.githubusercontent.com/24582831/148507554-87eb80bd-02c7-4c31-b102-c6141e231ec8.png" width="70%"/>
22+
</div>
23+
24+
```bibtex
25+
@article{strudel2021Segmenter,
26+
title={Segmenter: Transformer for Semantic Segmentation},
27+
author={Strudel, Robin and Ricardo, Garcia, and Laptev, Ivan and Schmid, Cordelia},
28+
journal={arXiv preprint arXiv:2105.05633},
29+
year={2021}
30+
}
31+
```
32+
33+
34+
## Usage
35+
36+
To use the pre-trained ViT model from [Segmenter](https://github.com/rstrudel/segmenter), it is necessary to convert keys.
37+
38+
We provide a script [`vitjax2mmseg.py`](../../tools/model_converters/vitjax2mmseg.py) in the tools directory to convert the key of models from [ViT-AugReg](https://github.com/rwightman/pytorch-image-models/blob/f55c22bebf9d8afc449d317a723231ef72e0d662/timm/models/vision_transformer.py#L54-L106) to MMSegmentation style.
39+
40+
```shell
41+
python tools/model_converters/vitjax2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}
42+
```
43+
44+
E.g.
45+
46+
```shell
47+
python tools/model_converters/vitjax2mmseg.py \
48+
Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz \
49+
pretrain/vit_tiny_p16_384.pth
50+
```
51+
52+
This script convert model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.
53+
54+
In our default setting, pretrained models and their corresponding [ViT-AugReg](https://github.com/rwightman/pytorch-image-models/blob/f55c22bebf9d8afc449d317a723231ef72e0d662/timm/models/vision_transformer.py#L54-L106) models could be defined below:
55+
56+
| pretrained models | original models |
57+
| ------ | -------- |
58+
|vit_tiny_p16_384.pth | ['vit_tiny_patch16_384'](https://storage.googleapis.com/vit_models/augreg/Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz) |
59+
|vit_small_p16_384.pth | ['vit_small_patch16_384'](https://storage.googleapis.com/vit_models/augreg/S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz) |
60+
|vit_base_p16_384.pth | ['vit_base_patch16_384'](https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz) |
61+
|vit_large_p16_384.pth | ['vit_large_patch16_384'](https://storage.googleapis.com/vit_models/augreg/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz) |
62+
63+
## Results and models
64+
65+
### ADE20K
66+
67+
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
68+
| ------ | -------- | --------- | ---------- | ------- | -------- | --- | --- | -------------- | ----- |
69+
| Segmenter-Mask | ViT-T_16 | 512x512 | 160000 | 1.21 | 27.98 | 39.99 | 40.83 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k/segmenter_vit-t_mask_8x1_512x512_160k_ade20k_20220105_151706-ffcf7509.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k/segmenter_vit-t_mask_8x1_512x512_160k_ade20k_20220105_151706.log.json) |
70+
| Segmenter-Linear | ViT-S_16 | 512x512 | 160000 | 1.78 | 28.07 | 45.75 | 46.82 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k/segmenter_vit-s_linear_8x1_512x512_160k_ade20k_20220105_151713-39658c46.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k/segmenter_vit-s_linear_8x1_512x512_160k_ade20k_20220105_151713.log.json) |
71+
| Segmenter-Mask | ViT-S_16 | 512x512 | 160000 | 2.03 | 24.80 | 46.19 | 47.85 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k/segmenter_vit-s_mask_8x1_512x512_160k_ade20k_20220105_151706-511bb103.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k/segmenter_vit-s_mask_8x1_512x512_160k_ade20k_20220105_151706.log.json) |
72+
| Segmenter-Mask | ViT-B_16 |512x512 | 160000 | 4.20 | 13.20 | 49.60 | 51.07 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k/segmenter_vit-b_mask_8x1_512x512_160k_ade20k_20220105_151706-bc533b08.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k/segmenter_vit-b_mask_8x1_512x512_160k_ade20k_20220105_151706.log.json) |
73+
| Segmenter-Mask | ViT-L_16 |640x640 | 160000 | 16.56 | 2.62 | 52.16 | 53.65 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k/segmenter_vit-l_mask_8x1_512x512_160k_ade20k_20220105_162750-7ef345be.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k/segmenter_vit-l_mask_8x1_512x512_160k_ade20k_20220105_162750.log.json) |

configs/segmenter/segmenter.yml

+125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
Collections:
2+
- Name: segmenter
3+
Metadata:
4+
Training Data:
5+
- ADE20K
6+
Paper:
7+
URL: https://arxiv.org/abs/2105.05633
8+
Title: 'Segmenter: Transformer for Semantic Segmentation'
9+
README: configs/segmenter/README.md
10+
Code:
11+
URL: https://github.com/open-mmlab/mmsegmentation/blob/v0.21.0/mmseg/models/decode_heads/segmenter_mask_head.py#L15
12+
Version: v0.21.0
13+
Converted From:
14+
Code: https://github.com/rstrudel/segmenter
15+
Models:
16+
- Name: segmenter_vit-t_mask_8x1_512x512_160k_ade20k
17+
In Collection: segmenter
18+
Metadata:
19+
backbone: ViT-T_16
20+
crop size: (512,512)
21+
lr schd: 160000
22+
inference time (ms/im):
23+
- value: 35.74
24+
hardware: V100
25+
backend: PyTorch
26+
batch size: 1
27+
mode: FP32
28+
resolution: (512,512)
29+
Training Memory (GB): 1.21
30+
Results:
31+
- Task: Semantic Segmentation
32+
Dataset: ADE20K
33+
Metrics:
34+
mIoU: 39.99
35+
mIoU(ms+flip): 40.83
36+
Config: configs/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k.py
37+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k/segmenter_vit-t_mask_8x1_512x512_160k_ade20k_20220105_151706-ffcf7509.pth
38+
- Name: segmenter_vit-s_linear_8x1_512x512_160k_ade20k
39+
In Collection: segmenter
40+
Metadata:
41+
backbone: ViT-S_16
42+
crop size: (512,512)
43+
lr schd: 160000
44+
inference time (ms/im):
45+
- value: 35.63
46+
hardware: V100
47+
backend: PyTorch
48+
batch size: 1
49+
mode: FP32
50+
resolution: (512,512)
51+
Training Memory (GB): 1.78
52+
Results:
53+
- Task: Semantic Segmentation
54+
Dataset: ADE20K
55+
Metrics:
56+
mIoU: 45.75
57+
mIoU(ms+flip): 46.82
58+
Config: configs/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k.py
59+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k/segmenter_vit-s_linear_8x1_512x512_160k_ade20k_20220105_151713-39658c46.pth
60+
- Name: segmenter_vit-s_mask_8x1_512x512_160k_ade20k
61+
In Collection: segmenter
62+
Metadata:
63+
backbone: ViT-S_16
64+
crop size: (512,512)
65+
lr schd: 160000
66+
inference time (ms/im):
67+
- value: 40.32
68+
hardware: V100
69+
backend: PyTorch
70+
batch size: 1
71+
mode: FP32
72+
resolution: (512,512)
73+
Training Memory (GB): 2.03
74+
Results:
75+
- Task: Semantic Segmentation
76+
Dataset: ADE20K
77+
Metrics:
78+
mIoU: 46.19
79+
mIoU(ms+flip): 47.85
80+
Config: configs/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k.py
81+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k/segmenter_vit-s_mask_8x1_512x512_160k_ade20k_20220105_151706-511bb103.pth
82+
- Name: segmenter_vit-b_mask_8x1_512x512_160k_ade20k
83+
In Collection: segmenter
84+
Metadata:
85+
backbone: ViT-B_16
86+
crop size: (512,512)
87+
lr schd: 160000
88+
inference time (ms/im):
89+
- value: 75.76
90+
hardware: V100
91+
backend: PyTorch
92+
batch size: 1
93+
mode: FP32
94+
resolution: (512,512)
95+
Training Memory (GB): 4.2
96+
Results:
97+
- Task: Semantic Segmentation
98+
Dataset: ADE20K
99+
Metrics:
100+
mIoU: 49.6
101+
mIoU(ms+flip): 51.07
102+
Config: configs/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k.py
103+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k/segmenter_vit-b_mask_8x1_512x512_160k_ade20k_20220105_151706-bc533b08.pth
104+
- Name: segmenter_vit-l_mask_8x1_512x512_160k_ade20k
105+
In Collection: segmenter
106+
Metadata:
107+
backbone: ViT-L_16
108+
crop size: (640,640)
109+
lr schd: 160000
110+
inference time (ms/im):
111+
- value: 381.68
112+
hardware: V100
113+
backend: PyTorch
114+
batch size: 1
115+
mode: FP32
116+
resolution: (640,640)
117+
Training Memory (GB): 16.56
118+
Results:
119+
- Task: Semantic Segmentation
120+
Dataset: ADE20K
121+
Metrics:
122+
mIoU: 52.16
123+
mIoU(ms+flip): 53.65
124+
Config: configs/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k.py
125+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k/segmenter_vit-l_mask_8x1_512x512_160k_ade20k_20220105_162750-7ef345be.pth
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
_base_ = [
2+
'../_base_/models/segmenter_vit-b16_mask.py',
3+
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
4+
'../_base_/schedules/schedule_160k.py'
5+
]
6+
optimizer = dict(lr=0.001, weight_decay=0.0)
7+
8+
img_norm_cfg = dict(
9+
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
10+
crop_size = (512, 512)
11+
train_pipeline = [
12+
dict(type='LoadImageFromFile'),
13+
dict(type='LoadAnnotations', reduce_zero_label=True),
14+
dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
15+
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
16+
dict(type='RandomFlip', prob=0.5),
17+
dict(type='PhotoMetricDistortion'),
18+
dict(type='Normalize', **img_norm_cfg),
19+
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
20+
dict(type='DefaultFormatBundle'),
21+
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
22+
]
23+
test_pipeline = [
24+
dict(type='LoadImageFromFile'),
25+
dict(
26+
type='MultiScaleFlipAug',
27+
img_scale=(2048, 512),
28+
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
29+
flip=False,
30+
transforms=[
31+
dict(type='Resize', keep_ratio=True),
32+
dict(type='RandomFlip'),
33+
dict(type='Normalize', **img_norm_cfg),
34+
dict(type='ImageToTensor', keys=['img']),
35+
dict(type='Collect', keys=['img'])
36+
])
37+
]
38+
data = dict(
39+
# num_gpus: 8 -> batch_size: 8
40+
samples_per_gpu=1,
41+
train=dict(pipeline=train_pipeline),
42+
val=dict(pipeline=test_pipeline),
43+
test=dict(pipeline=test_pipeline))
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
_base_ = [
2+
'../_base_/models/segmenter_vit-b16_mask.py',
3+
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
4+
'../_base_/schedules/schedule_160k.py'
5+
]
6+
7+
model = dict(
8+
pretrained='pretrain/vit_large_p16_384.pth',
9+
backbone=dict(
10+
type='VisionTransformer',
11+
img_size=(640, 640),
12+
embed_dims=1024,
13+
num_layers=24,
14+
num_heads=16),
15+
decode_head=dict(
16+
type='SegmenterMaskTransformerHead',
17+
in_channels=1024,
18+
channels=1024,
19+
num_heads=16,
20+
embed_dims=1024),
21+
test_cfg=dict(mode='slide', crop_size=(640, 640), stride=(608, 608)))
22+
23+
optimizer = dict(lr=0.001, weight_decay=0.0)
24+
25+
img_norm_cfg = dict(
26+
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
27+
crop_size = (640, 640)
28+
train_pipeline = [
29+
dict(type='LoadImageFromFile'),
30+
dict(type='LoadAnnotations', reduce_zero_label=True),
31+
dict(type='Resize', img_scale=(2048, 640), ratio_range=(0.5, 2.0)),
32+
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
33+
dict(type='RandomFlip', prob=0.5),
34+
dict(type='PhotoMetricDistortion'),
35+
dict(type='Normalize', **img_norm_cfg),
36+
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
37+
dict(type='DefaultFormatBundle'),
38+
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
39+
]
40+
test_pipeline = [
41+
dict(type='LoadImageFromFile'),
42+
dict(
43+
type='MultiScaleFlipAug',
44+
img_scale=(2048, 640),
45+
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
46+
flip=False,
47+
transforms=[
48+
dict(type='Resize', keep_ratio=True),
49+
dict(type='RandomFlip'),
50+
dict(type='Normalize', **img_norm_cfg),
51+
dict(type='ImageToTensor', keys=['img']),
52+
dict(type='Collect', keys=['img'])
53+
])
54+
]
55+
data = dict(
56+
# num_gpus: 8 -> batch_size: 8
57+
samples_per_gpu=1,
58+
train=dict(pipeline=train_pipeline),
59+
val=dict(pipeline=test_pipeline),
60+
test=dict(pipeline=test_pipeline))
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
_base_ = './segmenter_vit-s_mask_8x1_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
decode_head=dict(
5+
_delete_=True,
6+
type='FCNHead',
7+
in_channels=384,
8+
channels=384,
9+
num_convs=0,
10+
dropout_ratio=0.0,
11+
concat_input=False,
12+
num_classes=150,
13+
loss_decode=dict(
14+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)))

0 commit comments

Comments
 (0)