You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have a better fallback than asserting when there are not enough tensors to shard
Motivation
#406 rightfully asserts for now when there are not enough tensors to shard, but in truth we could have a more graceful fallback in that case, not all ranks will contribute to the optim update but that's typically fine and not really a perf limitation
Pitch
When not enough tensors, just make some ranks data-parallel contributors and skip the sharding for them.
Alternatives
Current state which imposes a hard limitation on ranks/number of tensors in the model
🚀 Feature
Have a better fallback than asserting when there are not enough tensors to shard
Motivation
#406 rightfully asserts for now when there are not enough tensors to shard, but in truth we could have a more graceful fallback in that case, not all ranks will contribute to the optim update but that's typically fine and not really a perf limitation
Pitch
When not enough tensors, just make some ranks data-parallel contributors and skip the sharding for them.
Alternatives
Current state which imposes a hard limitation on ranks/number of tensors in the model
Additional context
cc @kamo-naoyuki
The text was updated successfully, but these errors were encountered: