-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust vision transformer backbone architectures #524
Conversation
* Add DropPath, trunc_normal_ for VisionTransformer implementation; * Add class token buring intermediate period and remove it during final period;
Codecov Report
@@ Coverage Diff @@
## master #524 +/- ##
==========================================
+ Coverage 86.60% 86.68% +0.07%
==========================================
Files 99 101 +2
Lines 5160 5234 +74
Branches 834 848 +14
==========================================
+ Hits 4469 4537 +68
- Misses 533 535 +2
- Partials 158 162 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
* Remove class token and reshape entire token feature from NLC to NCHW;
* Add related unit test;
mmseg/models/backbones/vit.py
Outdated
final feature map. Default: False. | ||
interpolate_mode (str): Select the interpolate mode for position | ||
embeding vector resize. Default: bilinear. | ||
input_cls_token (bool): If concatenating class token into image tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_cls_token (bool): If concatenating class token into image tokens | |
with_cls_token (bool): If concatenating class token into image tokens |
mmseg/models/backbones/vit.py
Outdated
self.pos_drop = nn.Dropout(p=drop_rate) | ||
|
||
self.blocks = nn.Sequential(*[ | ||
self.num_stages = depth | ||
self.out_indices = tuple(range(self.num_stages)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may make this an argument, default to (12, )
self.drop_prob = drop_prob | ||
self.keep_prob = 1 - drop_prob | ||
|
||
def forward(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/test_models/test_utils/test_drop.py
mmseg/models/backbones/vit.py
Outdated
x = x[:, 1:] | ||
|
||
outs = [] | ||
block_len = len(self.blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not needed.
mmseg/models/backbones/vit.py
Outdated
block_len = len(self.blocks) | ||
for i, blk in enumerate(self.blocks): | ||
x = blk(x) | ||
if i == block_len - 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i == block_len - 1: | |
if i == len(self.blocks) - 1: |
* Add unit test for DropPath;
* Adjust vision transformer backbone architectures; * Add DropPath, trunc_normal_ for VisionTransformer implementation; * Add class token buring intermediate period and remove it during final period; * Fix some parameters loss bug; * * Store intermediate token features and impose no processes on them; * Remove class token and reshape entire token feature from NLC to NCHW; * Fix some doc error * Add a arg for VisionTransformer backbone to control if input class token into transformer; * Add stochastic depth decay rule for DropPath; * * Fix output bug when input_cls_token=False; * Add related unit test; * * Add arg: out_indices to control model output; * Add unit test for DropPath; * Apply suggestions from code review Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>
* Add 3D pose pipeline * Add transforms on joint coordinates in pipelines * Add camera projection in pipelines * Add camera interface in mmpose.core * Add 3D pose pipeline * Revise code * fix variable name * Add 3D pose pipeline * update unittests for better codecov rate * Add 3D pose pipeline * update unittests for better codecov rate * Add 3D pose pipeline * update unittests for better codecov rate * Add 3D pose pipeline * Add PoseSequenceToTensor * minor fix according to review comments * Revise according to review comments * rebase to master * extend fliplr_regression to handle 2D/3D * add remove_root option to JointRelativization * Fix docstring * update unittest * update unittest * update camera parameters to be in meter * minor fix to docstrings * minor fix * fix importing
Add DropPath, trunc_normal_ for VisionTransformer implementation;
Add arg for VisionTransformer backbone to control if input class token into transformer;
Class token will be removed during output stage;