Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1533 Fix distributed data parallel issue in ClassificationSaver #1535

Merged
merged 10 commits into from
Feb 3, 2021

Conversation

Nic-Ma
Copy link
Contributor

@Nic-Ma Nic-Ma commented Feb 1, 2021

Fixes #1533 .

Description

This PR fixed the file saving issue of ClassificationSaver in distributed data parallel mode.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh --codeformat --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Feb 1, 2021

/black

@Nic-Ma Nic-Ma force-pushed the 1533-fix-classificationsaver branch from 5597ad9 to cc53588 Compare February 1, 2021 16:05
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Feb 2, 2021

/black

monai-bot and others added 4 commits February 2, 2021 11:08
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Feb 2, 2021

/black

Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned some small things only. According to the code coverage output the tests aren't being run under multi GPU with Pytorch 1.7 so string_list_all_gather isn't being tested but it looks ok.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Feb 3, 2021

/black

@Nic-Ma Nic-Ma force-pushed the 1533-fix-classificationsaver branch from 5f829f9 to 3b005bd Compare February 3, 2021 02:55
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Feb 3, 2021

/black

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma Nic-Ma force-pushed the 1533-fix-classificationsaver branch from 34e56d9 to fc08308 Compare February 3, 2021 03:22
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Feb 3, 2021

/black

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@Nic-Ma Nic-Ma merged commit 26581a0 into Project-MONAI:master Feb 3, 2021
@wyli
Copy link
Contributor

wyli commented Feb 3, 2021

@ericspod the code coverage is inaccurate in our current setting as the multiprocess executions are not tracked properly...I'll create an issue

@Nic-Ma Nic-Ma deleted the 1533-fix-classificationsaver branch July 2, 2021 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClassificationSaver can't save all the data from ranks into CSV file
4 participants