[fix][OSS] adding an assert for empty shards + corresponding unit test #406

blefaudeux · 2021-02-19T23:01:58Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #405 .

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

fairscale/optim/oss.py

tests/optim/test_oss.py

blefaudeux · 2021-02-20T00:08:14Z

fairscale/utils/golden_testing_data.py

@@ -8,12 +8,12 @@

 adascale_test_data = [
    # "input" value is a list of input tensors for micro-batch/rank 0 and micro-batch/rank 1.
-    {"input": [[1.0, 0], [0, 1.0]], "expected_gain": 2.0},
+    {"input": [[1.0, 0], [0, 1.0]], "expected_gain": 4.0 / 3},


@min-xu-ai @mikerabbat checking with you that this is ok. Since adascale is not changed by this PR, I assumed that the current state was correct

why is 4.0/3 is the new value? maybe if you init the bias to 0 then the value here won't change? the original value of 2 is because we have two grads from two ranks that are completely independent. it must be the grad from the bias are point to the same direction now, hence 4 grads but 3 directions. So it is fine.

I tried setting the bias to zero, but the returned expected gain is still 1.3333

That's fine. I think the new value make sense. Keep it 4.0/3 would be good.

min-xu-ai

interesting change! I think changing the expected gains are fine. Perhaps a few comments would be fine.

min-xu-ai · 2021-02-20T00:10:00Z

tests/optim/test_oss_adascale.py

@@ -37,7 +37,7 @@ def _test_basic_func(rank, world_size, tempfile_name, test_case, oss, model=None
    _dist_init(rank, world_size, tempfile_name, backend="nccl")

    if model is None:
-        model = Linear(2, 2, bias=False)
+        model = Linear(2, 2)


you need the bias or otherwise there isn't enough params?

yes, exactly

min-xu-ai · 2021-02-20T00:12:19Z

tests/optim/test_oss.py

+def run_test_catch_empty_shardd(rank, world_size, tempfile_name):
+    dist_init(rank, world_size, tempfile_name, backend="gloo")
+    m = torch.nn.Linear(1, 1)
+    with pytest.raises(AssertionError):


min-xu-ai · 2021-02-20T00:15:01Z

fairscale/utils/golden_testing_data.py

@@ -8,12 +8,12 @@

 adascale_test_data = [
    # "input" value is a list of input tensors for micro-batch/rank 0 and micro-batch/rank 1.
-    {"input": [[1.0, 0], [0, 1.0]], "expected_gain": 2.0},
+    {"input": [[1.0, 0], [0, 1.0]], "expected_gain": 4.0 / 3},


why is 4.0/3 is the new value? maybe if you init the bias to 0 then the value here won't change? the original value of 2 is because we have two grads from two ranks that are completely independent. it must be the grad from the bias are point to the same direction now, hence 4 grads but 3 directions. So it is fine.

min-xu-ai · 2021-02-20T00:17:12Z

unfortunately the unit tests are still failing. Perhaps you need to update a few more places since the golden values are used in several places.

adding an assert + corresponding unit test

fb7f7e5

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2021

blefaudeux requested review from msbaines, min-xu-ai, anj-s and Vittorio-Caggiano and removed request for min-xu-ai February 19, 2021 23:09

anj-s reviewed Feb 19, 2021

View reviewed changes

fairscale/optim/oss.py Outdated Show resolved Hide resolved

anj-s reviewed Feb 19, 2021

View reviewed changes

tests/optim/test_oss.py Outdated Show resolved Hide resolved

updated changelog

9d3ce35

blefaudeux force-pushed the oss_assert_empty_shards branch from d24c484 to 9d3ce35 Compare February 20, 2021 00:07

blefaudeux requested a review from mikerabbat February 20, 2021 00:07

blefaudeux commented Feb 20, 2021

View reviewed changes

min-xu-ai approved these changes Feb 20, 2021

View reviewed changes

blefaudeux marked this pull request as draft February 20, 2021 16:19

blefaudeux mentioned this pull request Feb 20, 2021

ShardedDataParallel doesn't work with multiple nodes #397

Closed

adjusting the other adascale tests

2f91a2c

blefaudeux marked this pull request as ready for review February 22, 2021 18:41

blefaudeux mentioned this pull request Feb 22, 2021

[OSS] More flexible parameter handling in the low tensor case #409

Closed

another round of fixes

af704a7

blefaudeux merged commit 279b802 into master Feb 22, 2021

blefaudeux deleted the oss_assert_empty_shards branch February 22, 2021 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][OSS] adding an assert for empty shards + corresponding unit test #406

[fix][OSS] adding an assert for empty shards + corresponding unit test #406

blefaudeux commented Feb 19, 2021

blefaudeux Feb 20, 2021

min-xu-ai Feb 20, 2021

blefaudeux Feb 22, 2021

min-xu-ai Feb 22, 2021

min-xu-ai left a comment

min-xu-ai Feb 20, 2021

blefaudeux Feb 20, 2021

min-xu-ai Feb 20, 2021

min-xu-ai Feb 20, 2021

min-xu-ai commented Feb 20, 2021

[fix][OSS] adding an assert for empty shards + corresponding unit test #406

[fix][OSS] adding an assert for empty shards + corresponding unit test #406

Conversation

blefaudeux commented Feb 19, 2021

Before submitting

What does this PR do?

PR review

Did you have fun?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

min-xu-ai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

min-xu-ai commented Feb 20, 2021