Skip to content

Support multi-output models#170

Merged
tjruwase merged 10 commits intomasterfrom
olruwase/scale_none_loss
Mar 27, 2020
Merged

Support multi-output models#170
tjruwase merged 10 commits intomasterfrom
olruwase/scale_none_loss

Conversation

@tjruwase
Copy link
Contributor

Correctly handle multi-output models by performing loss scaling in backward() instead of forward()

Copy link

@g-karthik g-karthik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the tests closely -- the core changes look good to me!

@ShadenSmith
Copy link
Contributor

@tjruwase I think you will need to tell DeepSpeed to update the DeepSpeedExamples submodule in order to incorporate your fix to Squad last night.

You can do that from your branch:

git submodule update --remote DeepSpeedExamples

Then commit that update.

@ShadenSmith
Copy link
Contributor

@tjruwase would you do me a favor and also enable the Megatron tests in your branch for now? Megatron is disabled by default until I get the nightly tests going. Those lines are just commented out in tests/model/run_sanity_check.py. I'm wondering if we need to account for the new loss scaling in those tests too.

@tjruwase
Copy link
Contributor Author

@ShadenSmith Thanks for the guidance. I have updated the DeepSpeedExamples submodule and enabled Megatron in model tests. Now run_sanity_checks passes. For some reason, github has not received notification of the model tests passing.

@tjruwase tjruwase merged commit 53c73fe into master Mar 27, 2020
kouml pushed a commit to kouml/DeepSpeed that referenced this pull request Apr 3, 2020
* Push to remote

* Correctly handle multi output models by doing loss scaling in backward()
Unit tests for multi output models

* Fix formatting issues

* Formatting issues fix

* Fix formatting

* Update DeepSpeedExamples submodule
Enable Megatron model tests
jeffra pushed a commit to jeffra/DeepSpeed that referenced this pull request Apr 19, 2021
@jeffra jeffra deleted the olruwase/scale_none_loss branch September 24, 2021 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle None in loss scaling DeepSpeed assumes model returns just one variable: loss

5 participants