SimSiam #407

zlapp · 2020-11-25T22:20:43Z

What does this PR do?

Implement https://arxiv.org/pdf/2011.10566v1.pdf
Largely based on https://github.com/lucidrains/byol-pytorch extension of BYOL to support SimSiam.
I used pl-bolts BYOL implementation as a reference.
Colab gist for testing on cifar-10 https://gist.github.com/zlapp/c35b8c97d4f6537f21aa07bbc37959c9
Discussed on slack channel https://pytorch-lightning.slack.com/archives/C010PRC9M2R/p1606329394008100

Also adds KNN online evaluation callback

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-11-25T22:20:47Z

Hello @zlapp! Thanks for updating this PR.

In the file pl_bolts/models/self_supervised/simsiam/simsiam_module.py:

Line 144:72: W504 line break after binary operator

Comment last updated at 2021-01-17 20:56:04 UTC

codecov · 2020-11-26T00:41:44Z

Codecov Report

Merging #407 (1717c0a) into master (413b9df) will decrease coverage by 0.54%.
The diff coverage is 76.55%.

@@            Coverage Diff             @@
##           master     #407      +/-   ##
==========================================
- Coverage   79.49%   78.95%   -0.55%     
==========================================
  Files         102      105       +3     
  Lines        5912     6121     +209     
==========================================
+ Hits         4700     4833     +133     
- Misses       1212     1288      +76

Flag	Coverage Δ
cpu	`25.66% <22.00%> (-0.12%)`	⬇️
pytest	`25.66% <22.00%> (-0.12%)`	⬇️
unittests	`78.48% <76.55%> (-0.53%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...s/models/self_supervised/simsiam/simsiam_module.py	`72.67% <72.67%> (ø)`
pl_bolts/callbacks/knn_online.py	`90.00% <90.00%> (ø)`
pl_bolts/models/self_supervised/simsiam/models.py	`96.15% <96.15%> (ø)`
pl_bolts/models/self_supervised/__init__.py	`100.00% <100.00%> (ø)`
pl_bolts/datasets/cifar10_dataset.py	`71.73% <0.00%> (-26.09%)`	⬇️
pl_bolts/datasets/base_dataset.py	`81.81% <0.00%> (-13.64%)`	⬇️
...l_bolts/models/rl/vanilla_policy_gradient_model.py	`96.36% <0.00%> (+2.72%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 413b9df...1717c0a. Read the comment docs.

zlapp · 2020-11-28T16:22:19Z

Initial training results on cifar10 comparing SimSiam (orange) to BYOL (blue).

I am guessing there is a bug in the initial implementation.

zlapp · 2020-11-30T07:32:30Z

Better online accuracies after fix of detach SimSiam (red) to BYOL (blue):

Noticed SimSiam uses a factor of -1 times cosine similarity vs BYOL which uses a factor of -2

SimSiam loss:

BYOL loss:

zlapp · 2020-11-30T18:38:40Z

Changing the loss factor for SimSiam to -2 (and not dividing by 2) to be like BYOL improved results (nearly identical).
Seems like the loss factor needs further investigation.
SimSiam (light blue) to BYOL (blue):

zlapp · 2020-12-09T07:07:52Z

Reached 88% accuracy on CIFAR10 after 800 epochs:

Command:
python simsiam_module.py --dataset cifar10 --optimizer sgd --batch_size 512 --learning_rate 0.03 --max_epochs 800 --weight_decay 0.0005 --arch resnet18 --hidden_mlp 512 --online_ft

haideraltahan · 2020-12-18T23:37:13Z

pl_bolts/models/self_supervised/simsiam/simsiam_module.py

+        # Image 1 to image 2 loss
+        _, z1, h1 = self.online_network(img_1)
+        _, z2, h2 = self.target_network(img_2)
+        loss_a = -1.0 * self.cosine_similarity(h1, z2)
+
+        # Image 2 to image 1 loss
+        _, z1, h1 = self.online_network(img_2)
+        _, z2, h2 = self.target_network(img_1)
+        loss_b = -1.0 * self.cosine_similarity(h1, z2)


Based on SimSiam's pseudocode and other implementations (1, 2), shouldn't there be just one network without a deep copy target network?

Believe it is equivalent. For example if you refer to BYOL implementation in pl bolts there is a deep copy here there isn't https://github.com/lucidrains/byol-pytorch

Nonetheless, great job on the implementation. It might be worth investigating in the future if there is any difference in performance between the two methods. As with this approach, maybe there is more memory usage?

With a deep copy instead of using the same network twice you use ~3GB gpu extra memory with resnet18. The two versions are about equally fast on cifar10, but I think that's because the gpu is memorybound on the task.
Given that the final performane ends up the same, I guess removing the copied network would be a good idea as it's less wasteful and scales better.

Thanks @MikkelAntonsen.
Could you please share the version you ran with plot of results on CIFAR10?

Thanks for sharing @MikkelAntonsen.
I just pushed a commit with the changes you suggested in the gist.
Great job making the improvements in efficiency while maintaining accuracy.

I noticed that the SSLOnlineEvaluator does an additional forward pass per train_batch to get embeddings. It would be possible to return the encoder output from training_step() and which would be accessible in the outputs argument in SSLOnlineEvaluator, AFAIK. This is unfortunate because it couples the implementation of SSLOnlineEvaluator with the networks that uses it. But if we are to reproduce results on imagenet, is reducing the number of forward passes by 1/3 negligible?

Hi @MikkelAntonsen I believe this is related more to SSL capabilities in general of pl_bolts so might be better to open a separate issue since this isn't only effecting SimSiam (tagging @ananyahjha93).

@MikkelAntonsen Maybe the byol implementation can also benefit from using only one network (without deepcopy) and using detach() to control the gradient flow?

I just read the BYOL paper abstract and it seems like the online network and target network use a different set of weights, so I'm not sure how you could share a network. If you look at figure 2 in the paper, it does seem like they use the stop gradient trick, but only for the target network. Do you see any ways to incorporate simsiam ideas into BYOL without the implemention just ending up as simsiam?

Borda · 2021-01-02T21:17:39Z

@zlapp how is it going here, is it still WIP?

zlapp · 2021-01-03T10:24:52Z

@zlapp how is it going here, is it still WIP?

Hi @Borda, based on the results here #407 (comment) I believe the PR is ready to be merged.

akihironitta

@zlapp Would you mind having a look at https://github.com/zlapp/pytorch-lightning-bolts/pull/1?

wjn0 · 2021-01-07T20:43:12Z

@zlapp as a potential user thanks so much for your (+ the PR reviewers!) work, this looks great. I do have one question: the paper notes in appendix B that one difference between SimSiam and BYOL is the bottleneck structure in the predictor. Namely, the hidden dimension of the predictor MLP should be 1/4 of the output dimension. So, for example, with the default prediction space dimension of 256, I think the hidden dim should be 64. This apparently helps with training stability. Do you agree with my reading of the paper? If so, I'm wondering if this should be the default in the bolt as well?

akihironitta · 2021-01-10T13:16:47Z

@zlapp Would you mind having a look at zlapp#1?

Could you also have a look at https://github.com/zlapp/pytorch-lightning-bolts/pull/2?

zlapp

👍

summelon · 2021-04-21T04:46:22Z

The same question as @wjn0.
Any update?

akihironitta added the model label Nov 27, 2020

ananyahjha93 self-requested a review December 14, 2020 17:21

ananyahjha93 approved these changes Dec 14, 2020

View reviewed changes

haideraltahan reviewed Dec 18, 2020

View reviewed changes

akihironitta changed the title ~~[wip] SimSiam~~ SimSiam Jan 4, 2021

ananyahjha93 requested review from akihironitta and Borda as code owners January 7, 2021 14:14

akihironitta reviewed Jan 7, 2021

View reviewed changes

akihironitta mentioned this pull request Jan 10, 2021

Testing SimSiam akihironitta/lightning-bolts#5

Closed

8 tasks

Borda force-pushed the master branch from 8ea02ab to c4dbbfd Compare January 15, 2021 22:44

Borda added the Priority High priority task label Jan 17, 2021

Borda requested a review from akihironitta January 17, 2021 20:41

zlapp and others added 7 commits January 17, 2021 21:43

simsiam init imp

45a708b

add doc

42b322f

fix indent

790fe60

black reformatted

58ffec2

No grad fixes, detach in sim calc

6abbfed

adjusted loss factor -2

7d6737b

init dm similar to simclr implementation, revert loss to paper imp

0e18985

zlapp and others added 20 commits January 17, 2021 21:44

scikit-learn req

3b8dada

flake8

5ab5f4b

scikit-learn bump version 0.23

81fe48e

isort

2ceb2eb

isort

4b56fcd

fix detatch

6c63f0b

rm deep copy

696f271

Add types to knn_online.py

bc9c240

Add types to models.py

217b36f

Fix types in models.py

3764220

Fix tests

3214c85

Fix types in knn_online.py

13f5af3

Add SimSiam

0083276

Apply isort

829274e

Import sklearn as optional package

7819820

Fix flake8

75df5b9

Add args via Trainer and make the tests work on cpu

adf184d

Fix flake8

715d934

chlog

e648eec

yapf

1717c0a

zlapp commented Jan 17, 2021

View reviewed changes

Borda approved these changes Jan 17, 2021

View reviewed changes

Borda merged commit 043b557 into Lightning-Universe:master Jan 17, 2021

Borda added this to the v0.3 milestone Jan 18, 2021

This was referenced Mar 13, 2021

[tune](deps): Bump pytorch-lightning-bolts from 0.2.5 to 0.3.0 in /python/requirements suquark/ray#8

Closed

[tune](deps): Bump pytorch-lightning-bolts from 0.2.5 to 0.3.0 in /python/requirements sven1977/ray#8

Closed

akihironitta mentioned this pull request Mar 20, 2021

Add SimSiam to the docs #604

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimSiam #407

SimSiam #407

zlapp commented Nov 25, 2020 •

edited by Borda

Loading

pep8speaks commented Nov 25, 2020 •

edited

Loading

codecov bot commented Nov 26, 2020 •

edited

Loading

zlapp commented Nov 28, 2020

zlapp commented Nov 30, 2020

zlapp commented Nov 30, 2020 •

edited

Loading

zlapp commented Dec 9, 2020 •

edited

Loading

haideraltahan Dec 18, 2020 •

edited

Loading

zlapp Dec 22, 2020

haideraltahan Dec 22, 2020

MikkelAntonsen Dec 25, 2020

zlapp Dec 26, 2020

zlapp Dec 27, 2020

MikkelAntonsen Dec 30, 2020

zlapp Jan 1, 2021

MaveriQ Jan 23, 2022

MikkelAntonsen Jan 24, 2022

Borda commented Jan 2, 2021

zlapp commented Jan 3, 2021

akihironitta left a comment

wjn0 commented Jan 7, 2021

akihironitta commented Jan 10, 2021

zlapp left a comment •

edited

Loading

summelon commented Apr 21, 2021

SimSiam #407

SimSiam #407

Conversation

zlapp commented Nov 25, 2020 • edited by Borda Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

pep8speaks commented Nov 25, 2020 • edited Loading

Comment last updated at 2021-01-17 20:56:04 UTC

codecov bot commented Nov 26, 2020 • edited Loading

Codecov Report

zlapp commented Nov 28, 2020

zlapp commented Nov 30, 2020

zlapp commented Nov 30, 2020 • edited Loading

zlapp commented Dec 9, 2020 • edited Loading

haideraltahan Dec 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda commented Jan 2, 2021

zlapp commented Jan 3, 2021

akihironitta left a comment

Choose a reason for hiding this comment

wjn0 commented Jan 7, 2021

akihironitta commented Jan 10, 2021

zlapp left a comment • edited Loading

Choose a reason for hiding this comment

summelon commented Apr 21, 2021

zlapp commented Nov 25, 2020 •

edited by Borda

Loading

pep8speaks commented Nov 25, 2020 •

edited

Loading

codecov bot commented Nov 26, 2020 •

edited

Loading

zlapp commented Nov 30, 2020 •

edited

Loading

zlapp commented Dec 9, 2020 •

edited

Loading

haideraltahan Dec 18, 2020 •

edited

Loading

zlapp left a comment •

edited

Loading