Use SMP rank and size when applicable #411

rahul003 · 2020-12-14T22:49:43Z

Description of changes:

When SMP is used together with Horovod, there'll be multiple horovod 'groups'. Rank and size need to be queried from SMP in such cases.

This is not a problem for Pytorch as there is a single torch.distributed group there or MXNet as SMP doesn't support MXNet.

smdebug/tensorflow/base_hook.py

codecov-io · 2020-12-14T23:02:23Z

Codecov Report

Merging #411 (42d300b) into master (9d2d0c3) will decrease coverage by 1.57%.
The diff coverage is 62.96%.

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
- Coverage   77.70%   76.13%   -1.58%     
==========================================
  Files         113      113              
  Lines       10139    10165      +26     
==========================================
- Hits         7879     7739     -140     
- Misses       2260     2426     +166

Impacted Files	Coverage Δ
smdebug/tensorflow/base_hook.py	`70.28% <18.18%> (-5.75%)`	⬇️
smdebug/core/utils.py	`79.72% <74.41%> (+1.98%)`	⬆️
smdebug/tensorflow/callable_cache.py	`52.17% <0.00%> (-26.09%)`	⬇️
smdebug/tensorflow/utils.py	`64.59% <0.00%> (-23.45%)`	⬇️
smdebug/tensorflow/singleton_utils.py	`83.33% <0.00%> (-16.67%)`	⬇️
smdebug/profiler/tf_profiler_parser.py	`54.54% <0.00%> (-11.58%)`	⬇️
smdebug/tensorflow/collection.py	`84.53% <0.00%> (-11.35%)`	⬇️
smdebug/tensorflow/keras.py	`79.21% <0.00%> (-11.00%)`	⬇️
smdebug/rules/action/stop_training_action.py	`56.45% <0.00%> (-9.68%)`	⬇️
smdebug/core/logger.py	`66.12% <0.00%> (-8.07%)`	⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d2d0c3...42d300b. Read the comment docs.

rahul003 · 2020-12-15T00:13:33Z

ok, will do

…nto smp

smdebug/core/utils.py

…nto smp

This reverts commit 07a3fd9.

This reverts commit 07a3fd9. Co-authored-by: Nihal Harish <nihal42harish@gmail.com>

* Add smp rank * Switch to core initialize * Use smp size * Cache whether SMP can be imported * Lint * try import with noqa' * Add smp rank call in core * Import only once * Use nested except blocks

This reverts commit 07a3fd9. Co-authored-by: Nihal Harish <nihal42harish@gmail.com>

rahul003 added 3 commits December 8, 2020 15:25

Add smp rank

349b872

Switch to core initialize

8e227cd

Use smp size

bc39923

rahul003 requested review from NihalHarish and leleamol and removed request for NihalHarish December 14, 2020 22:49

Merge branch 'master' into smp

b7370d7

Vikas-kum reviewed Dec 14, 2020

View reviewed changes

smdebug/tensorflow/base_hook.py Show resolved Hide resolved

smdebug/tensorflow/base_hook.py Show resolved Hide resolved

smdebug/tensorflow/base_hook.py Show resolved Hide resolved

rahul003 added 4 commits December 15, 2020 11:29

Cache whether SMP can be imported

71fedb1

Merge branch 'smp' of https://github.com/awslabs/sagemaker-debugger i…

ae1394d

…nto smp

Lint

8a0f55f

try import with noqa'

c13ad9f

NihalHarish suggested changes Dec 16, 2020

View reviewed changes

smdebug/core/utils.py Outdated Show resolved Hide resolved

rahul003 added 7 commits December 29, 2020 11:31

Add smp rank call in core

fcce6b6

Import only once

8893d94

Merge branch 'master' into smp

2e3b6d4

Merge branch 'master' into smp

eb30633

Use nested except blocks

670a2df

Merge branch 'smp' of https://github.com/awslabs/sagemaker-debugger i…

6c1263b

…nto smp

Merge branch 'master' into smp

42d300b

NihalHarish approved these changes Jan 11, 2021

View reviewed changes

rahul003 changed the title ~~[TF] Use SMP rank and size when applicable~~ Use SMP rank and size when applicable Jan 11, 2021

rahul003 merged commit 07a3fd9 into master Jan 11, 2021

rahul003 deleted the smp branch January 11, 2021 19:12

NihalHarish added a commit that referenced this pull request Jan 15, 2021

Revert "Use SMP rank and size when applicable (#411)"

12fe775

This reverts commit 07a3fd9.

ndodda-amazon mentioned this pull request Jan 16, 2021

Revert "Use SMP rank and size when applicable" #424

Merged

rahul003 restored the smp branch January 16, 2021 00:45

ndodda-amazon added a commit that referenced this pull request Jan 16, 2021

Revert "Use SMP rank and size when applicable (#411)" (#424)

e431609

This reverts commit 07a3fd9. Co-authored-by: Nihal Harish <nihal42harish@gmail.com>

rahul003 mentioned this pull request Jan 16, 2021

Redo of PR 411 Use Smp rank and size when applicable #425

Merged

NihalHarish added a commit that referenced this pull request Jan 25, 2021

Revert "Use SMP rank and size when applicable (#411)" (#424)

a85573c

This reverts commit 07a3fd9. Co-authored-by: Nihal Harish <nihal42harish@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use SMP rank and size when applicable #411

Use SMP rank and size when applicable #411

Uh oh!

rahul003 commented Dec 14, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Dec 14, 2020 •

edited

Loading

Uh oh!

rahul003 commented Dec 15, 2020

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use SMP rank and size when applicable #411

Use SMP rank and size when applicable #411

Uh oh!

Conversation

rahul003 commented Dec 14, 2020

Description of changes:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rahul003 commented Dec 15, 2020

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-io commented Dec 14, 2020 •

edited

Loading