-
Notifications
You must be signed in to change notification settings - Fork 564
update ShardedEmbeddingBagCollection to be use registered EBCs with shardedTensors as registered modules #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D40458625 |
2 similar comments
This pull request was exported from Phabricator. Differential Revision: D40458625 |
This pull request was exported from Phabricator. Differential Revision: D40458625 |
This pull request was exported from Phabricator. Differential Revision: D40458625 |
This pull request was exported from Phabricator. Differential Revision: D40458625 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D40458625 |
This pull request was exported from Phabricator. Differential Revision: D40458625 |
This pull request was exported from Phabricator. Differential Revision: D40458625 |
…hardedTensors as registered modules (#88026) Summary: X-link: pytorch/pytorch#88026 Pull Request resolved: #758 update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit this works with DMP named_parameters() behavior changes -> use include_fused as temporary flag to gate this behavior note that due to ShardedTensor not supporting grads directly, this won't work for Dense compute kernels when non data parallel. This is not used today, and will add a TODO but is low pri Differential Revision: D40458625 fbshipit-source-id: 9135216ac67c828d8532d5c251cd6b8d170c058b
This pull request was exported from Phabricator. Differential Revision: D40458625 |
…e registered EBCs with shardedTensors as registered modules (#758) (#88026) Summary: X-link: meta-pytorch/torchrec#758 This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore. this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals) It defines device of ShardedTensor to be None when local_tensor() does not exist on rank update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit Differential Revision: D40458625 Pull Request resolved: #88026 Approved by: https://github.com/wanchaol, https://github.com/rohan-varma
…e registered EBCs with shardedTensors as registered modules (pytorch#758) (pytorch#88026) Summary: X-link: meta-pytorch/torchrec#758 This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore. this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals) It defines device of ShardedTensor to be None when local_tensor() does not exist on rank update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit Differential Revision: D40458625 Pull Request resolved: pytorch#88026 Approved by: https://github.com/wanchaol, https://github.com/rohan-varma
Summary:
update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit
this works with DMP
named_parameters() behavior changes -> use include_fused as temporary flag to gate this behavior
note that due to ShardedTensor not supporting grads directly, this won't work for Dense compute kernels when non data parallel. This is not used today, and will add a TODO but is low pri
Differential Revision: D40458625