Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Swin Transformer #511

Merged
merged 34 commits into from
Jul 1, 2021
Merged

Conversation

zeliu98
Copy link
Contributor

@zeliu98 zeliu98 commented Apr 23, 2021

No description provided.

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2021

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ xvjiarui
✅ Junjun2016
✅ sennnnn
❌ zeliu98
You have signed the CLA already but the status is still pending? Let us recheck it.

@zeliu98 zeliu98 changed the title add Swin transformer add Swin Transformer Apr 23, 2021
@@ -0,0 +1,496 @@
# Copyright (c) Open-MMLab. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may remove this file and directory.

Comment on lines 626 to 627
init_cfg=None,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
init_cfg=None,
)
init_cfg=None)

stride = to_2tuple(stride)
padding = to_2tuple(padding)
dilation = to_2tuple(dilation)
self.sampler = nn.Unfold(kernel_size, dilation, padding, stride)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The padding may need to calculate by users.



@ATTENTION.register_module()
class ShiftWindowMSA(BaseModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring.

class ShiftWindowMSA(BaseModule):

def __init__(self,
input_resolution,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our setting, the input size may change during infer or training. We should determine input size when initializing the module.

return windows


class SwinBlock(BaseModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring.

class SwinBlock(BaseModule):

def __init__(self,
input_resolution,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, input_size should be unknown.

Comment on lines 488 to 494
def forward(self, query):
for block in self.blocks:
query = block(query)

if self.downsample:
query = self.downsample(query)
return query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest pack (H, W) into hw_shape, and forward it also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the H, W are wrapped in the attributes of PatchMerging. H, W = self.output_resolution.

@codecov
Copy link

codecov bot commented Jun 26, 2021

Codecov Report

Merging #511 (3ac0547) into master (98067be) will decrease coverage by 0.62%.
The diff coverage is 72.06%.

❗ Current head 3ac0547 differs from pull request most recent head 4a00406. Consider uploading reports for the commit 4a00406 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #511      +/-   ##
==========================================
- Coverage   85.77%   85.14%   -0.63%     
==========================================
  Files         103      105       +2     
  Lines        5307     5668     +361     
  Branches      857      923      +66     
==========================================
+ Hits         4552     4826     +274     
- Misses        583      663      +80     
- Partials      172      179       +7     
Flag Coverage Δ
unittests 85.14% <72.06%> (-0.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmseg/models/utils/ckpt_convert.py 4.28% <4.28%> (ø)
mmseg/models/utils/embed.py 80.55% <80.55%> (ø)
mmseg/models/backbones/swin.py 86.89% <86.89%> (ø)
mmseg/models/backbones/__init__.py 100.00% <100.00%> (ø)
mmseg/models/backbones/vit.py 84.84% <100.00%> (-1.03%) ⬇️
mmseg/models/utils/__init__.py 100.00% <100.00%> (ø)
mmseg/models/necks/multilevel_neck.py 100.00% <0.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98067be...4a00406. Read the comment docs.

@clownrat6 clownrat6 changed the title add Swin Transformer [WIP] add Swin Transformer Jun 28, 2021
@clownrat6 clownrat6 changed the title [WIP] add Swin Transformer [WIP] Add Swin Transformer Jun 28, 2021
class ShiftWindowMSA(BaseModule):

def __init__(self,
input_resolution,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,91 @@
import torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified from xxx.

else:
self.downsample = None

def forward(self, x, H, W):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a docstring for this.

Comment on lines 465 to 471
if self.downsample:
stage_out = x
x = self.downsample(x, H, W)
DH, DW = (H + 1) // 2, (W + 1) // 2
return stage_out, H, W, x, DH, DW
else:
return x, H, W, x, H, W
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output should be x, hw_shape.
We don't need the previous hw_shape.

Comment on lines 51 to 53
stride=None,
padding=0,
dilation=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these args. Make stride=kernel_size.

2. Correct weight convert function;

3. Fix the pad of Patch Merging;
def __init__(self,
in_channels,
out_channels,
kernel_size=2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may remove this.

mlp_ratio=4,
depths=(2, 2, 6, 2),
num_heads=(3, 6, 12, 24),
strides=(None, None, None, None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the default is none?

num_heads (int): Parallel attention heads.
feedforward_channels (int): The hidden dimension for FFNs.
depth (int): The number of blocks in this stage.
kernel_size (int): The kernel_size of patch merging.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also remove this. Use stride only.

Comment on lines 469 to 470
padding (int): The padding length of patch merging.
dilation (int): The dilation rate of kernel of patch merging.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Comment on lines 493 to 496
kernel_size,
stride,
padding,
dilation,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep stride only.

Comment on lines 553 to 558
paddings (tuple[int], optional): The patch merging or patch
embedding padding length of each Swin Transformer stage.
Default: (0, 0, 0, 0).
dilations (tuple[int], optional): The patch merging or patch
embedding kernel dilation rate of each Swin Transformer stage.
Default: (1, 1, 1, 1).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer needed.

Comment on lines 689 to 690
if downsample:
in_channels = in_channels * 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to the definition of downample.

@Junjun2016 Junjun2016 merged commit b6c7c77 into open-mmlab:master Jul 1, 2021
bowenroom pushed a commit to bowenroom/mmsegmentation that referenced this pull request Feb 25, 2022
* add Swin Transformer

* add Swin Transformer

* fixed import

* Add some swin training settings.

* Fix some filename error.

* Fix attribute name: pretrain -> pretrained

* Upload mmcls implementation of swin transformer.

* Refactor Swin Transformer to follow mmcls style.

* Refactor init_weigths of swin_transformer.py

* Fix lint

* Match inference precision

* Add some comments

* Add swin_convert to load official style ckpt

* Remove arg: auto_pad

* 1. Complete comments for each block;

2. Correct weight convert function;

3. Fix the pad of Patch Merging;

* Clean function args.

* Fix vit unit test.

* 1. Add swin transformer unit tests;

2. Fix some pad bug;

3. Modify config to adapt new swin implementation;

* Modify config arg

* Update readme.md of swin

* Fix config arg error and Add some swin benchmark msg.

* Add MeM and ms test content for readme.md of swin transformer.

* Fix doc string of swin module

* 1. Register swin transformer to model list;

2. Modify pth url which keep meta attribute;

* Update swin.py

* Merge config settings.

* Modify config style.

* Update README.md

Add ViT link

* Modify main readme.md

Co-authored-by: Jiarui XU <xvjiarui0826@gmail.com>
Co-authored-by: sennnnn <201730271412@mail.scut.edu.cn>
Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>
aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this pull request Mar 27, 2023
…ful). (open-mmlab#511)

* Removing `autocast` for `35-25% speedup`.

* iQuality

* Adding a slow test.

* Fixing mps noise generation.

* Raising error on wrong device, instead of just casting on behalf of user.

* Quality.

* fix merge

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
sibozhang pushed a commit to sibozhang/mmsegmentation that referenced this pull request Mar 22, 2024
* resolve comments

* update changelog

* add test_batch

* add testing for `test_batch`

* fix mmcv version

* add test_batch

* add testing for `test_batch`

* enlarge test_input to pass unittest

* update names

* update changelog & faq

* update name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants