Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Video Swin Transformer #2369

Merged
merged 94 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
f961e75
init video swin
innat Mar 1, 2024
578205a
add: 3d window size computation
innat Mar 1, 2024
9817025
add: mlp layer
innat Mar 1, 2024
3343db1
add: patch embedding layer
innat Mar 1, 2024
7ab5cab
add: patch merging layer
innat Mar 1, 2024
f70a61b
add: window attention layer
innat Mar 1, 2024
5472fc6
add: basic layer for video swin
innat Mar 1, 2024
76d444b
update: basic layer for video swin
innat Mar 1, 2024
715b8a3
add: swin blocks for video swin
innat Mar 1, 2024
3ca0042
create and add: video swin backbone
innat Mar 1, 2024
3d845c5
rename: video swin layers to model specific
innat Mar 1, 2024
1af8bd4
update module import
innat Mar 1, 2024
ed2864d
update module import
innat Mar 1, 2024
bf70fa9
set class method to private usage
innat Mar 1, 2024
eca5023
set init params for backbone
innat Mar 1, 2024
420e229
rm redundant imports
innat Mar 1, 2024
f73e25b
add video swin layer test cases
innat Mar 1, 2024
1ccf7ee
add: videoswin backbone aliases
innat Mar 1, 2024
c5d5fa2
add: video swin backbone presets
innat Mar 1, 2024
27b6596
add: video swin backbone presets test
innat Mar 1, 2024
814db52
update: video swin backbone presets test
innat Mar 1, 2024
cc6ac21
add: video classifier task
innat Mar 1, 2024
d2d883d
add: video swin classifier presets
innat Mar 1, 2024
125b2dc
run formatters
innat Mar 1, 2024
9827302
rename module name/id"
innat Mar 2, 2024
89a715a
add hard-coded normalization for include rescaling=true
innat Mar 2, 2024
36db030
add docstring for videoswin backbone
innat Mar 2, 2024
7aa27a4
update metadata: backbone presets no weights
innat Mar 2, 2024
62a8703
update: backbone presets no weights test
innat Mar 2, 2024
aad5661
update video swin aliases for no weights
innat Mar 2, 2024
048d85a
add: video swin backbone presets with weights
innat Mar 2, 2024
1423e83
update: video swin aliases with weights presets
innat Mar 2, 2024
2eaf8b0
update video swin layer test cases
innat Mar 2, 2024
f713304
added patch merging test
innat Mar 2, 2024
44dae81
imported video swins presets to backbone presets list"
innat Mar 2, 2024
daca84f
fix: typos"
innat Mar 2, 2024
b1a5427
run formatters"
innat Mar 2, 2024
c66673c
fix: linting issue
innat Mar 2, 2024
84d4e03
fix: linting issue
innat Mar 2, 2024
d126b7c
fix: video swin layer test cases"
innat Mar 3, 2024
61303be
add: video swin backbone test
innat Mar 3, 2024
af5878c
rm redundant code
innat Mar 3, 2024
ffe457c
disable preset test temporary
innat Mar 4, 2024
f8d3e26
set include rescale to true
innat Mar 4, 2024
1d0ad36
add video swin components to __init__
innat Mar 4, 2024
838a506
update docstrings: video siwn layers scripts
innat Mar 5, 2024
b4f1534
update copywrite status: video siwn layers test scripts
innat Mar 5, 2024
75c5b66
update copywrite status: video siwn backbone scripts
innat Mar 5, 2024
0b9808b
bug fixes: video swin backbone layers
innat Mar 5, 2024
0a4e2cb
update get config of video swin backbone
innat Mar 5, 2024
fb732d0
enable: video swin backbone test cases
innat Mar 5, 2024
4443335
update: video swin backbone test cases
innat Mar 5, 2024
f3411cb
update: video swin backbone preset test cases
innat Mar 5, 2024
00c67ba
run formatters
innat Mar 5, 2024
9d3ab2e
fix typos: video swin backbone test cases
innat Mar 5, 2024
5bdc8b4
add: non implemented property for test reason
innat Mar 5, 2024
cb5da28
fix: typos
innat Mar 5, 2024
82a8497
add: video classifier test
innat Mar 6, 2024
e2f5056
update: video classifier test
innat Mar 6, 2024
146f32f
update: video classifier test input shape
innat Mar 6, 2024
d25746b
bug fix: mlp layer build method
innat Mar 6, 2024
9779ad4
updated: swin back layer build method
innat Mar 6, 2024
7fa3f83
bug fix: use tf.TensorShape in compute_output_shape method
innat Mar 6, 2024
c8aea50
update: video_classifier_test model.predict to model.call
innat Mar 6, 2024
8287395
update test cases and format the code
innat Mar 6, 2024
e9a3997
update docstrings and preset config
innat Mar 9, 2024
aab1a6c
fix jax DynamicJaxprTrace issue for
innat Mar 10, 2024
ac78108
update config of backbone aliases
innat Mar 11, 2024
1dbded9
add can run in mixed precision test
innat Mar 18, 2024
42003a2
add can run on gray video
innat Mar 18, 2024
e731389
minor fix
innat Mar 18, 2024
77197c2
specify axis in keras.ops.take to match with tf.gather
innat Mar 20, 2024
aa20067
specify include rescaling to backbone class
innat Mar 24, 2024
11f33d7
remove shift size form get config of video basic layer
innat Mar 24, 2024
a2961b9
add support arbitrary input shape
innat Mar 24, 2024
49b074a
minor updates to swin layers
innat Mar 24, 2024
204e4b1
test method update for swin layers
innat Mar 24, 2024
251495b
update test method to swin backbone
innat Mar 24, 2024
599d481
remove unsed code
innat Mar 24, 2024
a849b38
bug fix in call method of patch embed layer
innat Mar 24, 2024
f611b0e
fix typo in patch merging layer
innat Mar 24, 2024
b7d26e4
minor fix
innat Mar 24, 2024
e3e02dc
fix keras.ops.cond issue with jax
innat Mar 25, 2024
a626b1f
no test for jit compile in torch
innat Mar 25, 2024
c484445
reduce tensor size for forward test
innat Mar 25, 2024
45945c9
minor fix
innat Mar 28, 2024
f866d12
remove kcv export decorator
innat Mar 31, 2024
bfb62a4
update keras.Layer import
innat Mar 31, 2024
57f0012
remove unused layer import
innat Mar 31, 2024
7602052
replace keras.layers instead of layers
innat Mar 31, 2024
837286d
update keras.Layer to keras.layers.Layer for keras2
innat Mar 31, 2024
6d44eca
add window_size param to aliases
innat Mar 31, 2024
f5dce04
move vide swin layer to model specific directory
innat Apr 2, 2024
0ba9fdf
minor fix
innat Apr 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions keras_cv/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,11 +179,24 @@
ResNetV2Backbone,
)
from keras_cv.models.backbones.vgg16.vgg16_backbone import VGG16Backbone
from keras_cv.models.backbones.video_swin.video_swin_aliases import (
VideoSwinBBackbone,
)
from keras_cv.models.backbones.video_swin.video_swin_aliases import (
VideoSwinSBackbone,
)
from keras_cv.models.backbones.video_swin.video_swin_aliases import (
VideoSwinTBackbone,
)
from keras_cv.models.backbones.video_swin.video_swin_backbone import (
VideoSwinBackbone,
)
from keras_cv.models.backbones.vit_det.vit_det_aliases import ViTDetBBackbone
from keras_cv.models.backbones.vit_det.vit_det_aliases import ViTDetHBackbone
from keras_cv.models.backbones.vit_det.vit_det_aliases import ViTDetLBackbone
from keras_cv.models.backbones.vit_det.vit_det_backbone import ViTDetBackbone
from keras_cv.models.classification.image_classifier import ImageClassifier
from keras_cv.models.classification.video_classifier import VideoClassifier
from keras_cv.models.feature_extractor.clip import CLIP
from keras_cv.models.object_detection.retinanet.retinanet import RetinaNet
from keras_cv.models.object_detection.yolo_v8.yolo_v8_backbone import (
Expand Down
3 changes: 3 additions & 0 deletions keras_cv/models/backbones/backbone_presets.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from keras_cv.models.backbones.mobilenet_v3 import mobilenet_v3_backbone_presets
from keras_cv.models.backbones.resnet_v1 import resnet_v1_backbone_presets
from keras_cv.models.backbones.resnet_v2 import resnet_v2_backbone_presets
from keras_cv.models.backbones.video_swin import video_swin_backbone_presets
from keras_cv.models.backbones.vit_det import vit_det_backbone_presets
from keras_cv.models.object_detection.yolo_v8 import yolo_v8_backbone_presets

Expand All @@ -42,6 +43,7 @@
**efficientnet_lite_backbone_presets.backbone_presets_no_weights,
**yolo_v8_backbone_presets.backbone_presets_no_weights,
**vit_det_backbone_presets.backbone_presets_no_weights,
**video_swin_backbone_presets.backbone_presets_no_weights,
}

backbone_presets_with_weights = {
Expand All @@ -55,6 +57,7 @@
**efficientnet_lite_backbone_presets.backbone_presets_with_weights,
**yolo_v8_backbone_presets.backbone_presets_with_weights,
**vit_det_backbone_presets.backbone_presets_with_weights,
**video_swin_backbone_presets.backbone_presets_with_weights,
}

backbone_presets = {
Expand Down
13 changes: 13 additions & 0 deletions keras_cv/models/backbones/video_swin/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright 2024 The KerasCV Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
158 changes: 158 additions & 0 deletions keras_cv/models/backbones/video_swin/video_swin_aliases.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Copyright 2024 The KerasCV Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import copy

from keras_cv.models.backbones.video_swin.video_swin_backbone import (
VideoSwinBackbone,
)
from keras_cv.models.backbones.video_swin.video_swin_backbone_presets import (
backbone_presets,
)
from keras_cv.utils.python_utils import classproperty

ALIAS_DOCSTRING = """VideoSwin{size}Backbone model.

Reference:
- [Video Swin Transformer](https://arxiv.org/abs/2106.13230)
- [Video Swin Transformer GitHub](https://github.com/SwinTransformer/Video-Swin-Transformer)

For transfer learning use cases, make sure to read the
[guide to transfer learning & fine-tuning](https://keras.io/guides/transfer_learning/).

Examples:
```python
input_data = np.ones(shape=(1, 32, 224, 224, 3))

# Randomly initialized backbone
model = VideoSwin{size}Backbone()
output = model(input_data)
```
""" # noqa: E501


class VideoSwinTBackbone(VideoSwinBackbone):
def __new__(
cls,
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=[8, 7, 7],
include_rescaling=True,
**kwargs,
):
kwargs.update(
{
"embed_dim": embed_dim,
"depths": depths,
"num_heads": num_heads,
"window_size": window_size,
"include_rescaling": include_rescaling,
}
)
return VideoSwinBackbone.from_preset("videoswin_tiny", **kwargs)

@classproperty
def presets(cls):
"""Dictionary of preset names and configurations."""
return {
"videoswin_tiny_kinetics400": copy.deepcopy(
backbone_presets["videoswin_tiny_kinetics400"]
),
}

@classproperty
def presets_with_weights(cls):
"""Dictionary of preset names and configurations that include
weights."""
return cls.presets


class VideoSwinSBackbone(VideoSwinBackbone):
def __new__(
cls,
embed_dim=96,
depths=[2, 2, 18, 2],
num_heads=[3, 6, 12, 24],
window_size=[8, 7, 7],
include_rescaling=True,
**kwargs,
):
kwargs.update(
{
"embed_dim": embed_dim,
"depths": depths,
"num_heads": num_heads,
"window_size": window_size,
"include_rescaling": include_rescaling,
}
)
return VideoSwinBackbone.from_preset("videoswin_small", **kwargs)

@classproperty
def presets(cls):
"""Dictionary of preset names and configurations."""
return {
"videoswin_small_kinetics400": copy.deepcopy(
backbone_presets["videoswin_small_kinetics400"]
),
}

@classproperty
def presets_with_weights(cls):
"""Dictionary of preset names and configurations that include
weights."""
return cls.presets


class VideoSwinBBackbone(VideoSwinBackbone):
def __new__(
cls,
embed_dim=128,
depths=[2, 2, 18, 2],
num_heads=[4, 8, 16, 32],
window_size=[8, 7, 7],
include_rescaling=True,
**kwargs,
):
kwargs.update(
{
"embed_dim": embed_dim,
"depths": depths,
"num_heads": num_heads,
"window_size": window_size,
"include_rescaling": include_rescaling,
}
)
return VideoSwinBackbone.from_preset("videoswin_base", **kwargs)

@classproperty
def presets(cls):
"""Dictionary of preset names and configurations."""
return {
"videoswin_base_kinetics400": copy.deepcopy(
backbone_presets["videoswin_base_kinetics400"]
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backbone base model has more than one checkpints.

  1. with kinetics-400-base (current)
  2. with kinetics-400-base-imagenet22k
  3. with kinetics-600-base-imagenet22k
  4. with something-something-v2

How to facilitate the preset method for all of these?

def presets(cls):
    """Dictionary of preset names and configurations."""
    return {
        "videoswin_base_kinetics400": copy.deepcopy(
            backbone_presets["videoswin_base_kinetics400"]
        ),
        "videoswin_base_kinetics400_imagenet22k": copy.deepcopy(
            backbone_presets["videoswin_base_kinetics400_imagenet22k"]
        ),
        ...
    }

}

@classproperty
def presets_with_weights(cls):
"""Dictionary of preset names and configurations that include
weights."""
return cls.presets


setattr(VideoSwinTBackbone, "__doc__", ALIAS_DOCSTRING.format(size="T"))
setattr(VideoSwinSBackbone, "__doc__", ALIAS_DOCSTRING.format(size="S"))
setattr(VideoSwinBBackbone, "__doc__", ALIAS_DOCSTRING.format(size="B"))
Loading
Loading