accelerator refactor - fix memory issue with ddp_spawn #5855

awaelchli · 2021-02-06T23:14:42Z

When using ddp spawn, moving the model to gpu before processes are spawn creates extra memory usage on the root gpu.
We already move the model to device in the plugins, so this shouldn't be necessary in setup.

awaelchli · 2021-02-06T23:16:21Z

pytorch_lightning/accelerators/gpu.py

@@ -16,7 +16,6 @@ def setup(self, trainer, model):
 raise MisconfigurationException(f"Device should be GPU, got {self.root_device} instead")
 self.set_nvidia_flags()
 torch.cuda.set_device(self.root_device)
- model.to(self.root_device)
 return super().setup(trainer, model)



Just wanted to double check here with @tchaton and @justusschock
is it good to remove this? any implications for rpc/sequential?
For the other plugins I have not seen anything breaking.

I think this is fine

justusschock · 2021-02-08T07:54:11Z

I'll close this in favor of #5866

fix memory issue with ddp_spawn

e7595b5

awaelchli commented Feb 6, 2021

View reviewed changes

justusschock approved these changes Feb 7, 2021

View reviewed changes

awaelchli mentioned this pull request Feb 8, 2021

accelerator refactor - fix for sharded parity test #5866

Merged

justusschock closed this Feb 8, 2021

tchaton deleted the refactor/memory branch February 8, 2021 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accelerator refactor - fix memory issue with ddp_spawn #5855

accelerator refactor - fix memory issue with ddp_spawn #5855

awaelchli commented Feb 6, 2021

awaelchli Feb 6, 2021

justusschock Feb 7, 2021

justusschock commented Feb 8, 2021

accelerator refactor - fix memory issue with ddp_spawn #5855

accelerator refactor - fix memory issue with ddp_spawn #5855

Conversation

awaelchli commented Feb 6, 2021

awaelchli Feb 6, 2021

Choose a reason for hiding this comment

justusschock Feb 7, 2021

Choose a reason for hiding this comment

justusschock commented Feb 8, 2021