Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biases are excluded from BYOL moving average update #440

Closed
wjn0 opened this issue Dec 9, 2020 · 5 comments · Fixed by #574
Closed

Biases are excluded from BYOL moving average update #440

wjn0 opened this issue Dec 9, 2020 · 5 comments · Fixed by #574
Assignees
Labels
bug Something isn't working model

Comments

@wjn0
Copy link
Contributor

wjn0 commented Dec 9, 2020

Based on the code of the BYOLMAWeightUpdate, bias terms are excluded from the moving average update. I don't think this matches the original paper. I did not see any documentation on this point, but it seems intentional. Is there a source or reference for this difference (assuming I haven't misunderstood the original paper)?

The relevant code is here, excerpted:

# apply MA weight update
for (name, online_p), (_, target_p) in zip(online_net.named_parameters(), target_net.named_parameters()):
    if 'weight' in name:
        target_p.data = self.current_tau * target_p.data + (1 - self.current_tau) * online_p.data

To see some of the parameters which are excluded from the MA update with this condition in place (when using a ResNet-50 as the backbone, for example), you can run the following snippet:

from torchvision.models import resnet50
print([param_name for param_name in list(dict(resnet50().named_parameters()).keys()) if 'weight' not in param_name])
@wjn0 wjn0 added the question Further information is requested label Dec 9, 2020
@github-actions
Copy link

github-actions bot commented Dec 9, 2020

Hi! thanks for your contribution!, great first issue!

@akihironitta
Copy link
Contributor

@annikabrundyn Would you mind having a look?

@sidhantls
Copy link
Contributor

sidhantls commented Dec 22, 2020

Yeah, the paper explicitly mentions that bias parameters are omitted from optimization with LARS and weight decay but there's no mention about omission for the moving average update

@stale stale bot added the won't fix This will not be worked on label Feb 20, 2021
@wjn0
Copy link
Contributor Author

wjn0 commented Feb 25, 2021

I think this issue should be looked at rather than closed (and ideally fixed if it indeed wasn't an intentional modification) - not that it seems to affect performance much. The PR would be super simple, and I'm happy to do if the original contributor of this implementation is too busy?

@stale stale bot removed the won't fix This will not be worked on label Feb 25, 2021
@annikabrundyn
Copy link
Contributor

hey @wjn0 I've just checked the paper again and I think you're right - thanks for spotting this! if you'd like you can submit a PR and ping me for review? also happy to make the change myself this weekend

wjn0 added a commit to wjn0/pytorch-lightning-bolts that referenced this issue Mar 4, 2021
No longer exclude biases from the moving average update in BYOL.

Fixes Lightning-Universe#440
@wjn0 wjn0 mentioned this issue Mar 4, 2021
8 tasks
@akihironitta akihironitta added fix fixing issues... model and removed question Further information is requested labels Mar 4, 2021
@Lightning-Universe Lightning-Universe deleted a comment from stale bot Mar 4, 2021
@Borda Borda closed this as completed in #574 Mar 4, 2021
Borda pushed a commit that referenced this issue Mar 4, 2021
No longer exclude biases from the moving average update in BYOL.

Fixes #440
@Borda Borda added bug Something isn't working and removed fix fixing issues... labels Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants