3686 Skip workflow run if data is empty or the specified epoch_length is 0 #3690

Nic-Ma · 2022-01-20T16:26:41Z

Fixes #3686 .

Description

This PR enhanced the workflow to skip run if data is empty or the specified epoch_length is 0.

Status

Ready

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

merge master

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma · 2022-01-20T16:26:56Z

/black

Nic-Ma · 2022-01-20T16:27:02Z

/build

SachidanandAlle

LGTM.. should be good if we have the test case for multi-gpu as well

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma · 2022-01-21T07:48:01Z

Hi @ericspod @SachidanandAlle ,

Thanks for the review.
I spent much time today to add support for multi-gpu training case, I tried many methods, but no ideal method can guarantee our distributed communication logic works fine with less ranks that the setting, it's PyTorch or NCCL logic, we can't dynamically add or reduce ranks. For example, some handlers or metrics need all ranks to run the same logic to all-gather the result, otherwise, it will hang.
I already added unit tests for multi-gpu training and added more message in the warning for the hanging case.

Thanks.

Nic-Ma · 2022-01-21T07:48:15Z

/black

Nic-Ma · 2022-01-21T07:48:22Z

/build

… is 0 (#3690) * [DLMED] check 0 length Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] add dist tests Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma and others added 6 commits February 1, 2021 19:15

Merge pull request #19 from Project-MONAI/master

42a45e0

merge master

Merge pull request #32 from Project-MONAI/master

cd16a13

merge master

Merge pull request #180 from Project-MONAI/dev

6f87afd

merge master

Merge pull request #214 from Project-MONAI/dev

f398298

merge master

Merge pull request #355 from Project-MONAI/dev

47d4bc9

merge master

[DLMED] check 0 length

d1d7537

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma requested review from ericspod, rijobro and wyli January 20, 2022 16:27

ericspod approved these changes Jan 20, 2022

View reviewed changes

SachidanandAlle approved these changes Jan 20, 2022

View reviewed changes

[DLMED] add dist tests

41736f4

Signed-off-by: Nic Ma <nma@nvidia.com>

wyli merged commit e96dcca into Project-MONAI:dev Jan 21, 2022

wyli pushed a commit that referenced this pull request Jan 21, 2022

3686 Skip workflow run if data is empty or the specified epoch_length…

fd46880

… is 0 (#3690) * [DLMED] check 0 length Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] add dist tests Signed-off-by: Nic Ma <nma@nvidia.com>

wyli pushed a commit that referenced this pull request Jan 21, 2022

3686 Skip workflow run if data is empty or the specified epoch_length…

8d362c1

… is 0 (#3690) * [DLMED] check 0 length Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] add dist tests Signed-off-by: Nic Ma <nma@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3686 Skip workflow run if data is empty or the specified epoch_length is 0 #3690

3686 Skip workflow run if data is empty or the specified epoch_length is 0 #3690

Nic-Ma commented Jan 20, 2022

Nic-Ma commented Jan 20, 2022

Nic-Ma commented Jan 20, 2022

SachidanandAlle left a comment

Nic-Ma commented Jan 21, 2022

Nic-Ma commented Jan 21, 2022

Nic-Ma commented Jan 21, 2022

3686 Skip workflow run if data is empty or the specified epoch_length is 0 #3690

3686 Skip workflow run if data is empty or the specified epoch_length is 0 #3690

Conversation

Nic-Ma commented Jan 20, 2022

Description

Status

Types of changes

Nic-Ma commented Jan 20, 2022

Nic-Ma commented Jan 20, 2022

SachidanandAlle left a comment

Choose a reason for hiding this comment

Nic-Ma commented Jan 21, 2022

Nic-Ma commented Jan 21, 2022

Nic-Ma commented Jan 21, 2022