-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Official implementation of SETR #531
Merged
Merged
Changes from 50 commits
Commits
Show all changes
88 commits
Select commit
Hold shift + click to select a range
da9573b
Adjust vision transformer backbone architectures;
f9c8420
Merge Master
2f580e9
Fix some parameters loss bug;
e1d59cd
* Store intermediate token features and impose no processes on them;
e4a11e7
Fix some doc error
d5644c1
Add a arg for VisionTransformer backbone to control if input class to…
70cde52
Add stochastic depth decay rule for DropPath;
97af059
* Fix output bug when input_cls_token=False;
16e21ca
Re-implement of SETR
53e7e80
* Modify some docs of heads of SETR;
995a728
* Modify some arg of setr heads;
7ff896d
Merge branch 'master' into setr
b3d5258
Merge Master
8d18d86
* Add 768x768 cityscapes dataset config;
2c93446
* Fix the low code coverage of unit test about heads of setr;
efe6913
* Add pascal context dataset & ade20k dataset config;
4b2fd5e
Modify folder structure.
0ae504d
add setr
CuttlefishXuan 1377131
modify vit
CuttlefishXuan d47e10c
Fix the test_cfg arg position;
27d1479
Fix some learning schedule bug;
f70b315
optimize setr code
be0d2fb
Add arg: final_reshape to control if converting output feature inform…
49130ce
Fix the default value of final_reshape;
b83992d
Merge branch 'vit_final_reshape' into setr
f7052f9
Modify arg: final_reshape to arg: out_shape;
c5858f5
Fix some unit test bug;
c8ea16a
Merge branch 'vit_final_reshape' into setr
0599c71
Add MLA neck;
7eda50b
Merge pr #526
0d92194
Remove some rebundant files.
7040840
* Fix the code style bug;
851366a
Ignoring CityscapesCoarseDataset and MapillaryDataset.
c424a35
Fix the activation function loss bug;
a40ed61
Fix the img_size bug of SETR_PUP_ADE20K
ad2ca50
Merge Master
8629d60
Merge Master
4f24d57
Merge branch 'setr_official' of github.com:sennnnn/mmsegmentation int…
5ab2b35
* Fix the lint bug of transformers.py;
a70621c
Convert vit of setr out shape from NLC to NCHW.
4161634
* Modify Resize action of data pipeline;
a5f8c1f
Remove arg: find_unused_parameters which is False by default.
0636d9e
Error auxiliary head of PUP deit
125c1ee
Remove the minimal restrict of slide inference.
760d0c5
Modify doc string of Resize
45f3df3
Seperate this part of code to a new PR #544
16c5fe5
* Remove some rebundant codes;
81410a1
Fix the tuple in_channels of mla_deit.
1d92146
Merge branch 'master' into setr_official
cdc6d30
Modify code style
3d39112
Modify implementation of SETR Heads
9b80384
non-square input support for setr heads
03eb097
Modify config argument for above commits
cb50538
Remove norm_layer argument of SETRMLAHead
2115d66
Add mla_align_corners for MLAModule interpolate
9ec4c9e
[Refactor]Refactor of SETRMLAHead
104ff0c
[Refactor]MLA Neck
d2b0107
Fix config bug
949cb65
[Refactor]SETR Naive Head and SETR PUP Head
3962975
[Fix]Fix the lack of arg: act_cfg and arg: norm_cfg
3d1bef5
Merge branch 'master' into setr_official
90b9b3d
Fix config error
5a3b376
Refactor of SETR MLA, Naive, PUP heads.
8f7c141
Modify some attribute name of SETR Heads.
8bfc651
Merge Master
c9c4284
Modify setr configs to adapt new vit code.
f45ca2d
Fix trunc_normal_ bug
e8fd36b
Parameters init adjustment.
1090fb3
Remove redundant doc string of SETRUPHead
39c6070
Fix pretrained bug
07ffcd2
[Fix] Fix vit init bug
4c22dff
Add some vit unit tests
bc0bcdd
Modify module import
e33a1c5
Remove norm from PatchEmbed
16f1fab
Fix pretrain weights bug
9acbf5a
Modify pretrained judge
1f4b5b8
Update vit init
bb294e0
Fix some gradient backward bugs.
bf71f60
Add some unit tests to improve code cov
37e65b3
Merge branch 'vit_init_refactor' into setr_official
a19b5d8
Fix init_weights of setr up head
a8f609f
Merge master
8409843
Add DropPath in FFN
07e38fa
Finish benchmark of SETR
399e053
Remove DropPath implementation and use DropPath from mmcv.
5ff95f1
Modify out_indices arg
23f734a
Fix out_indices bug.
82d4455
Remove cityscapes base dataset config.
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
_base_ = './cityscapes.py' | ||
img_norm_cfg = dict( | ||
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | ||
crop_size = (768, 768) | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadAnnotations'), | ||
dict(type='Resize', img_scale=(2049, 1025), ratio_range=(0.5, 2.0)), | ||
xvjiarui marked this conversation as resolved.
Show resolved
Hide resolved
|
||
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict(type='PhotoMetricDistortion'), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'gt_semantic_seg']), | ||
] | ||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict( | ||
type='MultiScaleFlipAug', | ||
img_scale=(2049, 1025), | ||
xvjiarui marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], | ||
flip=False, | ||
transforms=[ | ||
dict(type='Resize', keep_ratio=True), | ||
dict(type='RandomFlip'), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='ImageToTensor', keys=['img']), | ||
dict(type='Collect', keys=['img']), | ||
]) | ||
] | ||
data = dict( | ||
train=dict(pipeline=train_pipeline), | ||
val=dict(pipeline=test_pipeline), | ||
test=dict(pipeline=test_pipeline)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# model settings | ||
backbone_norm_cfg = dict(type='LN', eps=1e-6, requires_grad=True) | ||
norm_cfg = dict(type='SyncBN', requires_grad=True) | ||
model = dict( | ||
type='EncoderDecoder', | ||
pretrained=\ | ||
'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_p16_384-b3be5167.pth', # noqa | ||
backbone=dict( | ||
type='VisionTransformer', | ||
img_size=(768, 768), | ||
patch_size=16, | ||
in_channels=3, | ||
embed_dim=1024, | ||
depth=24, | ||
num_heads=16, | ||
out_indices=(5, 11, 17, 23), | ||
drop_rate=0.1, | ||
norm_cfg=backbone_norm_cfg, | ||
out_shape='NCHW', | ||
with_cls_token=False, | ||
interpolate_mode='bilinear', | ||
), | ||
neck=dict( | ||
type='MLA', | ||
in_channels=[1024, 1024, 1024, 1024], | ||
out_channels=256, | ||
norm_cfg=norm_cfg, | ||
act_cfg=dict(type='ReLU'), | ||
), | ||
decode_head=dict( | ||
type='SETRMLAHead', | ||
in_channels=(1024, 1024, 1024, 1024), | ||
channels=512, | ||
in_index=(0, 1, 2, 3), | ||
img_size=(768, 768), | ||
mla_channels=256, | ||
mlahead_channels=128, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), | ||
auxiliary_head=[ | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=0, | ||
img_size=(768, 768), | ||
mla_channels=256, | ||
num_classes=19, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=1, | ||
img_size=(768, 768), | ||
mla_channels=256, | ||
num_classes=19, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=2, | ||
img_size=(768, 768), | ||
mla_channels=256, | ||
num_classes=19, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=3, | ||
img_size=(768, 768), | ||
mla_channels=256, | ||
num_classes=19, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)) | ||
], | ||
train_cfg=dict(), | ||
test_cfg=dict(mode='whole')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# model settings | ||
backbone_norm_cfg = dict(type='LN', eps=1e-6, requires_grad=True) | ||
norm_cfg = dict(type='SyncBN', requires_grad=True) | ||
model = dict( | ||
type='EncoderDecoder', | ||
pretrained=\ | ||
'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_p16_384-b3be5167.pth', # noqa | ||
backbone=dict( | ||
type='VisionTransformer', | ||
img_size=(768, 768), | ||
patch_size=16, | ||
in_channels=3, | ||
embed_dim=1024, | ||
depth=24, | ||
num_heads=16, | ||
out_indices=(9, 14, 19, 23), | ||
drop_rate=0.1, | ||
norm_cfg=backbone_norm_cfg, | ||
out_shape='NCHW', | ||
with_cls_token=True, | ||
interpolate_mode='bilinear', | ||
), | ||
decode_head=dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=3, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=1, | ||
conv3x3_conv1x1=False, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), | ||
auxiliary_head=[ | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=0, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=1, | ||
conv3x3_conv1x1=False, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=1, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=1, | ||
conv3x3_conv1x1=False, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=2, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=1, | ||
conv3x3_conv1x1=False, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)) | ||
], | ||
train_cfg=dict(), | ||
test_cfg=dict(mode='whole')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# model settings | ||
backbone_norm_cfg = dict(type='LN', eps=1e-6, requires_grad=True) | ||
norm_cfg = dict(type='SyncBN', requires_grad=True) | ||
model = dict( | ||
type='EncoderDecoder', | ||
pretrained=\ | ||
'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_p16_384-b3be5167.pth', # noqa | ||
backbone=dict( | ||
type='VisionTransformer', | ||
img_size=(768, 768), | ||
patch_size=16, | ||
in_channels=3, | ||
embed_dim=1024, | ||
depth=24, | ||
num_heads=16, | ||
out_indices=(9, 14, 19, 23), | ||
drop_rate=0.1, | ||
norm_cfg=backbone_norm_cfg, | ||
out_shape='NCHW', | ||
with_cls_token=True, | ||
interpolate_mode='bilinear', | ||
), | ||
decode_head=dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=3, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=4, | ||
up_mode='bilinear', | ||
num_up_layer=4, | ||
conv3x3_conv1x1=True, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), | ||
auxiliary_head=[ | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=0, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=2, | ||
conv3x3_conv1x1=True, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=1, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=2, | ||
conv3x3_conv1x1=True, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=2, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=2, | ||
conv3x3_conv1x1=True, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRUPHead', | ||
in_channels=1024, | ||
channels=512, | ||
in_index=3, | ||
img_size=(768, 768), | ||
embed_dim=1024, | ||
num_classes=19, | ||
norm_cfg=norm_cfg, | ||
num_convs=2, | ||
up_mode='bilinear', | ||
num_up_layer=2, | ||
conv3x3_conv1x1=True, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)) | ||
], | ||
train_cfg=dict(), | ||
test_cfg=dict(mode='whole')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
_base_ = ['./setr_mla_480x480_80k_pascal_context_bs_8.py'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may add this in future PR. |
||
|
||
data = dict(samples_per_gpu=2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
_base_ = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may add this in future PR. |
||
'../_base_/models/setr_mla.py', '../_base_/datasets/pascal_context.py', | ||
'../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' | ||
] | ||
norm_cfg = dict(type='SyncBN', requires_grad=True) | ||
model = dict( | ||
backbone=dict(img_size=(480, 480), drop_rate=0), | ||
decode_head=dict(img_size=(480, 480), num_classes=60), | ||
auxiliary_head=[ | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=0, | ||
img_size=(480, 480), | ||
mla_channels=256, | ||
num_classes=60, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=1, | ||
img_size=(480, 480), | ||
mla_channels=256, | ||
num_classes=60, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=2, | ||
img_size=(480, 480), | ||
mla_channels=256, | ||
num_classes=60, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), | ||
dict( | ||
type='SETRMLAAUXHead', | ||
in_channels=256, | ||
channels=512, | ||
in_index=3, | ||
img_size=(480, 480), | ||
mla_channels=256, | ||
num_classes=19, | ||
align_corners=False, | ||
loss_decode=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)) | ||
], | ||
test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) | ||
|
||
optimizer = dict( | ||
lr=0.001, | ||
weight_decay=0.0, | ||
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)})) | ||
|
||
data = dict(samples_per_gpu=1) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also benchmark the default
cityscapes.py
config.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may first benchmark 2-4 configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already add