[OSS] More flexible parameter handling in the low tensor case #409

blefaudeux · 2021-02-22T18:46:10Z

🚀 Feature

Have a better fallback than asserting when there are not enough tensors to shard

Motivation

#406 rightfully asserts for now when there are not enough tensors to shard, but in truth we could have a more graceful fallback in that case, not all ranks will contribute to the optim update but that's typically fine and not really a perf limitation

Pitch

When not enough tensors, just make some ranks data-parallel contributors and skip the sharding for them.

Alternatives

Current state which imposes a hard limitation on ranks/number of tensors in the model

Additional context

cc @kamo-naoyuki

blefaudeux · 2021-03-05T18:24:13Z

fixed

blefaudeux self-assigned this Feb 22, 2021

blefaudeux added the enhancement New feature or request label Feb 22, 2021

blefaudeux linked a pull request Mar 5, 2021 that will close this issue

[fix][minor] Change empty shard handling for OSS, do not rely on asserts #460

Merged

4 tasks

blefaudeux closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSS] More flexible parameter handling in the low tensor case #409

[OSS] More flexible parameter handling in the low tensor case #409

blefaudeux commented Feb 22, 2021

blefaudeux commented Mar 5, 2021

[OSS] More flexible parameter handling in the low tensor case #409

[OSS] More flexible parameter handling in the low tensor case #409

Comments

blefaudeux commented Feb 22, 2021

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

blefaudeux commented Mar 5, 2021