Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Video SwinTransformer #6521

Merged
merged 36 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
e013c71
Just start adding mere copy paste
oke-aditya Aug 30, 2022
75cef21
Replace d with t and D with T
oke-aditya Sep 18, 2022
25f4be4
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Sep 18, 2022
6b7eabb
Align swin transformer video to image a bit
oke-aditya Sep 18, 2022
37d1ac1
Rename d -> t
oke-aditya Sep 18, 2022
fe703f0
align with 2d impl
oke-aditya Sep 21, 2022
44cafea
align with 2d impl
oke-aditya Sep 21, 2022
c1d01f7
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Sep 21, 2022
c42b346
Add helpful comments and config for 3d
oke-aditya Sep 23, 2022
d2de089
add docs
oke-aditya Sep 23, 2022
8af0227
Add docs
oke-aditya Sep 23, 2022
4329ba2
Add configurations
oke-aditya Sep 23, 2022
48c8fd1
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Sep 23, 2022
5ebcf4d
Add docs
oke-aditya Sep 23, 2022
bddd6db
Fix bugs
oke-aditya Sep 24, 2022
15072ba
Fix wrong edit
oke-aditya Sep 24, 2022
5f2677e
Fix wrong edit
oke-aditya Sep 24, 2022
73536f5
Fix bugs
oke-aditya Sep 26, 2022
4dffdc6
Fix bugs
oke-aditya Sep 28, 2022
2f5fe37
Fix as per fx suggestions
oke-aditya Sep 28, 2022
ae5950d
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Sep 28, 2022
071e3b2
Update torchvision/models/video/swin_transformer.py
oke-aditya Sep 30, 2022
d29294d
Fix as per fx suggestions
oke-aditya Oct 1, 2022
2ec2fef
Fix expect files and code
oke-aditya Oct 1, 2022
508dbac
Update the expect files
YosuaMichael Oct 3, 2022
85bbf9b
Merge branch 'main' into add_videoswin
YosuaMichael Oct 3, 2022
8a2e254
Modify video swin
oke-aditya Nov 5, 2022
0cd56ab
Merge branch 'add_videoswin' of github.com:oke-aditya/vision into add…
oke-aditya Nov 5, 2022
d31e567
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Nov 11, 2022
2c0bd1f
Add min size and min temporal size, num params
oke-aditya Nov 11, 2022
edf2078
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Nov 11, 2022
f15a120
Merge branch 'main' of github.com:pytorch/vision into add_videoswin
oke-aditya Nov 14, 2022
2ce421b
Add flops and size
oke-aditya Nov 14, 2022
072545b
Fix types
oke-aditya Nov 15, 2022
fcf2f54
Merge branch 'main' into add_videoswin
YosuaMichael Nov 16, 2022
8eddca8
Fix url recipe
oke-aditya Nov 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,7 @@ pre-trained weights:
models/video_mvit
models/video_resnet
models/video_s3d
models/video_swin_transformer

|

Expand Down
2 changes: 1 addition & 1 deletion docs/source/models/swin_transformer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Model builders
--------------

The following model builders can be used to instantiate an SwinTransformer model (original and V2) with and without pre-trained weights.
All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer``
All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer``
base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py>`_ for
more details about this class.
Expand Down
27 changes: 27 additions & 0 deletions docs/source/models/video_swin_transformer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Video SwinTransformer
=====================

.. currentmodule:: torchvision.models.video

The Video SwinTransformer model is based on the `Video Swin Transformer <https://arxiv.org/abs/2106.13230>`__ paper.

.. betastatus:: video module


Model builders
--------------

The following model builders can be used to instantiate a VideoResNet model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.swin_transformer.SwinTransformer3d`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/video/swin_transformer.py>`_ for
more details about this class.

.. autosummary::
:toctree: generated/
:template: function.rst

swin3d_t
swin3d_s
swin3d_b
Binary file added test/expect/ModelTester.test_swin3d_b_expect.pkl
Binary file not shown.
Binary file added test/expect/ModelTester.test_swin3d_s_expect.pkl
Binary file not shown.
Binary file added test/expect/ModelTester.test_swin3d_t_expect.pkl
Binary file not shown.
4 changes: 3 additions & 1 deletion torchvision/models/swin_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,8 @@ def __init__(
)

def forward(self, x: Tensor):
# Here is the difference, we apply norm after the attention in V2.
# In V1 we applied norm before the attention.
x = x + self.stochastic_depth(self.norm1(self.attn(x)))
x = x + self.stochastic_depth(self.norm2(self.mlp(x)))
return x
Expand Down Expand Up @@ -587,7 +589,7 @@ def __init__(

num_features = embed_dim * 2 ** (len(depths) - 1)
self.norm = norm_layer(num_features)
self.permute = Permute([0, 3, 1, 2])
self.permute = Permute([0, 3, 1, 2]) # B H W C -> B C H W
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.flatten = nn.Flatten(1)
self.head = nn.Linear(num_features, num_classes)
Expand Down
1 change: 1 addition & 0 deletions torchvision/models/video/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .mvit import *
from .resnet import *
from .s3d import *
from .swin_transformer import *
Loading