Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid configuring SyncBatchNorm when not fitting #9243

Closed
four4fish opened this issue Sep 1, 2021 · 1 comment · Fixed by #11919
Closed

Avoid configuring SyncBatchNorm when not fitting #9243

four4fish opened this issue Sep 1, 2021 · 1 comment · Fixed by #11919
Assignees
Labels
distributed Generic distributed-related topic feature Is an improvement or enhancement good first issue Good for newcomers let's do it! approved to implement refactor
Milestone

Comments

@four4fish
Copy link
Contributor

four4fish commented Sep 1, 2021

Proposed refactoring or deprecation

Conditionally configure syncbatchnorm in distributed plugins

Motivation

This issue is closely related to #6977
Carrying forward discussion from #9096
Related issue in PyTorch: pytorch/pytorch#48988

SyncBatchNorm doesn't sync stats when used under eval mode. We can conditionally check for whether to configure this if we're fitting vs validating/testing/predicting.

Pitch

Conditionally determine whether to configure syncbatchnorm in the module here: https://github.com/PyTorchLightning/pytorch-lightning/blob/a451997c4da89be3b1e4f7f79b52015bd32f2ea4/pytorch_lightning/plugins/training_type/ddp.py#L384-L387

Essentially rewrite this

        if self.sync_batchnorm:
            self.model = self.configure_sync_batchnorm(self.model)

        # skip wrapping the model if we are not fitting as no gradients need to be exchanged
        trainer_fn = self.lightning_module.trainer.state.fn
        if trainer_fn == TrainerFn.FITTING:
            self.configure_ddp()

as this

        # skip wrapping the model if we are not fitting as no gradients need to be exchanged
        trainer_fn = self.lightning_module.trainer.state.fn
        if trainer_fn == TrainerFn.FITTING:
            if self.sync_batchnorm:
                self.model = self.configure_sync_batchnorm(self.model)
            self.configure_ddp()

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

@four4fish four4fish added feature Is an improvement or enhancement help wanted Open to be worked on refactor labels Sep 1, 2021
@ananthsub ananthsub added distributed Generic distributed-related topic good first issue Good for newcomers labels Sep 1, 2021
@ananthsub ananthsub changed the title Avoid converting to batchnorm when not fitting Avoid configuring SyncBatchNorm when not fitting Sep 2, 2021
@tchaton
Copy link
Contributor

tchaton commented Sep 3, 2021

Hey @four4fish,

Do you see any situations where users might want to update their BatchNorm stats on the validation dataset in a distributed way. If not, I think it is a good proposal.

Best,
T.C

@four4fish four4fish removed the help wanted Open to be worked on label Sep 8, 2021
@four4fish four4fish self-assigned this Sep 8, 2021
@tchaton tchaton added the let's do it! approved to implement label Sep 10, 2021
@edward-io edward-io self-assigned this Feb 14, 2022
@ananthsub ananthsub added this to the 1.6 milestone Feb 14, 2022
@carmocca carmocca moved this to In Progress in Frameworks Planning Feb 16, 2022
Repository owner moved this from In Progress to Done in Frameworks Planning Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed Generic distributed-related topic feature Is an improvement or enhancement good first issue Good for newcomers let's do it! approved to implement refactor
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants