Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Video Swin Transformer Model #2262

Closed
innat opened this issue Dec 23, 2023 · 9 comments · Fixed by #2369
Closed

Add Video Swin Transformer Model #2262

innat opened this issue Dec 23, 2023 · 9 comments · Fixed by #2369

Comments

@innat
Copy link
Contributor

innat commented Dec 23, 2023

Short Description

Video Swin Transformer is a pure transformer based video modeling algorithm, attained top accuracy on the major video recognition benchmarks.

Papers

https://arxiv.org/abs/2106.13230
published in 2021, Cited by 1154 (until now).

Existing Implementations

Other Information

@divyashreepathihalli
Copy link
Collaborator

@innat Thanks for filing the issue! Are you interested in contributing?

@innat
Copy link
Contributor Author

innat commented Dec 30, 2023

@divyashreepathihalli

@innat Thanks for filing the issue! Are you interested in contributing?

Unfortunately I don't have long bandwidth to keep working on this feature, (I've noticed there are many pending PR). Therefore, unless there is a high-priority inclusion of this feature in kerascv's current roadmap, I am willing to offer guidance to any contributor interested. Thank you for your understanding.

@simeetnayan81
Copy link

simeetnayan81 commented Jan 3, 2024

Hey @innat @divyashreepathihalli. This project seems interesting and I wish to contribute. Will require some guidance too since I am new to Keras codebase.

@innat
Copy link
Contributor Author

innat commented Jan 5, 2024

@simeetnayan81
As you're new to keras-cv, first take a look how they iimplemented backbone and image classification task. According to that, you may start adding video swin as backbone and create video classifier as high level task. For model implementation in keras-v3, please check the first post.

@divyashreepathihalli
Copy link
Collaborator

Thank you @simeetnayan81 for your interest and thank you @innat for your help! The team currently does not have bandwidth for this. We appreciate the help!!

@ID6109
Copy link
Contributor

ID6109 commented Jan 6, 2024

Hey @innat @divyashreepathihalli! I'd love to add this model to the codebase. I have prior experience with handling the models implemented in KerasCV as well. Thanks!

@innat
Copy link
Contributor Author

innat commented Jan 6, 2024

@ID6109 @simeetnayan81
Thank you both. Feel free to start working. You guys can collaborate to each other. Check out the resource I shared in the first post.

Note, unlike image model which only have imagenet weight currently, video mdoels often comes with pretrained weight for mutliple dataset, i.e. kinetrics, something something. Also, their rescaling can be different. At first, you don't need to worry about weight, just start adding backbone and high level classifier.

@divyashreepathihalli
Copy link
Collaborator

Created a branch - https://github.com/keras-team/keras-cv/tree/video-swin-transformer
please collaborate on this branch and then we can open a PR to master from here.

@innat innat mentioned this issue Jan 27, 2024
5 tasks
@innat innat mentioned this issue Mar 5, 2024
5 tasks
@innat
Copy link
Contributor Author

innat commented Mar 15, 2024

@divyashreepathihalli
Is it acceptable to take guideline from practitioner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants