-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SwinV2 #6246
Add SwinV2 #6246
Conversation
And I see an issue from the Microsoft SwinTransformer repo: microsoft/Swin-Transformer#194 It thinks it's not necessary to divide the mask into 9 parts because 4 parts are already enough. |
It seems these mod operations are considered as nodes of the fx graph, and their output is Previous SwinTransformer V1 doesn't fail at these tests only because those mod operations are not sampled with that random seed. |
@ain-soph Thanks a lot for your contribution! Concerning the linter, if you check the CI on tab Concerning your point the PatchMerging, I believe you are right. FX doesn't let you use the input of the tensor to control the flow of the program. This needs to move outside of the main function and be declared a non fx-traceable operator. I would patch this ASAP outside of this PR. cc @YosuaMichael |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ain-soph The PR looks in the right direction. I've added a few comments for your attention, please have a look.
Note that I'll be on PTO the next 2 weeks so @YosuaMichael offered to provide assistance. He is also a Meta maintainer so you can schedule a call if needed to speed this up.
Please note that my comments don't address the validity of the changes you do on the ML side but mostly on the idioms and coding practices we use at TorchVision. A more deep dive check would be necessary to confirm validity. The first step I think is to take the original pre-trained weights from the paper and confirm that we can load them in your implementation (by making some conversion) and then reproduce the reported accuracy using our reference scripts. Once we verify this, the next step is to train the tiny variant to show-case we can reproduce the accuracy. If you don't have access to a GPU cluster, we can help.
Again many thanks for your contribution and help on this.
cc @xiaohu2015 and @jdsgomes who worked on the original Swin implementation to see if they have any early feedback.
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
… into swin_transfomer_v2
Hi @ain-soph, thanks a lot for the PR! Meanwhile, let me address some of the issue you raise:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.register_buffer("relative_position_index", relative_position_index) | ||
|
||
def get_relative_position_bias(self) -> torch.Tensor: | ||
relative_position_bias_table: torch.Tensor = self.cpb_mlp(self.relative_coords_table).view(-1, self.num_heads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use flatten(end_dim=-2)
here instead of view(-1, self.num_heads)
?
I've launched another train based on the most up-to-date code for tiny architecture.
|
@ain-soph Could you please share the exact command you use to train it to ensure we are not missing anything from our side? |
Ooops, there are also memory issues on GPU both on linux and windows. I know that a different test is actually failing but this can happen. Sometimes adding a bing new model leads to issues on other models because of failing to clear the memory properly. Usually the way around this is to reduce the memory footprint of the test by passing thought smaller sizes for input or disabling the particular test on the GPU. |
|
Thank you for sharing the commands. Although you previously mention I missed that in v2 they used a different resolution size. I launched new jobs to train all the variants with the correct resolution size and seems are looking better now. |
@ain-soph thank you for your patience and great work! I managed to train all three variants and the results look good. My plan is to update this PR in the next couple of ours with the model weights. |
Trainning commands:
Test commands and acuracies
|
I guess the only final thing on our plate is to fix the memory issue. I’ll work on it in the following week. Btw, should we provide porting model weights from official repo as an alternative? |
@ain-soph @jdsgomes Awesome work! Having SwinV2 in TorchVision is really awesome. Looking forward using them.
I think given we were able to reproduce the accuracy of the paper, there is no point offering both. What would be interesting on the future is to offer higher accuracy weights by using newer training recipes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on Yosua previous review of the ML side and mine replication of the results approving. Thanks @ain-soph for you patience and great work.
As soon as the tests are green we we are good to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed @jdsgomes 5 last commits and everything LGTM. I would be OK to merge on green CI (if the memory issue persists on the CI, we might need to turn off some of the tests).
Summary: * init submit * fix typo * support ufmt and mypy * fix 2 unittest errors * fix ufmt issue * Apply suggestions from code review * unify codes * fix meshgrid indexing * fix a bug * fix type check * add type_annotation * add slow model * fix device issue * fix ufmt issue * add expect pickle file * fix jit script issue * fix type check * keep consistent argument order * add support for pretrained_window_size * avoid code duplication * a better code reuse * update window_size argument * make permute and flatten operations modular * add PatchMergingV2 * modify expect.pkl * use None as default argument value * fix type check * fix indent * fix window_size (temporarily) * remove "v2_" related prefix and add v2 builder * remove v2 builder * keep default value consistent with official repo * deprecate dropout * deprecate pretrained_window_size * fix dynamic padding edge case * remove unused imports * remove doc modification * Revert "deprecate dropout" This reverts commit 8a13f93. * Revert "fix dynamic padding edge case" This reverts commit 1c7579c. * remove unused kwargs * add downsample docs * revert block default value * revert argument order change * explicitly specify start_dim * add small and base variants * add expect files and slow_models * Add model weights and documentation for swin v2 * fix lint * fix end of files line Reviewed By: datumbox Differential Revision: D38824237 fbshipit-source-id: d94082c210c26665e70bdf8967ef72cbe3ed4a8a Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Joao Gomes <jdsgomes@fb.com>
Fixes #6242