-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping DDP override in sync with upstream torch #4630
Comments
@pritamdamania87 suggested this workaround: Instead of overriding DDP, we can wrap the LightningModule in another nn.Module. This wrapper module will define its Then when we wrap this module in DDP, we can rely on the wrapper's |
As @ananthsub mentioned, I'd suggest always calling DDP.forward and not relying on the internals of DDP. The current implementation in LightningDistributedDataParallel could break if we make changes in DDP and the reducer. As an example, mmcv was doing something similar and broke when we refactored some code related to DDP and reducer: open-mmlab/mmcv#636. |
@ananthsub I think this could work, nice idea! And I guess we should unwrap the model when passing it to the callbacks etc. so the user never sees the wrapper. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
solved by the linked PR |
From @ananthsub:
how should Lightning keep its DDP override in sync with the upstream torch DistributedDataParallel? these implementations have now diverged. I think this leads to performance degradations with Lightning + gradient accumulations, since the require_backward_grad_sync attribute isn't checked before the backwards pass
The text was updated successfully, but these errors were encountered: