Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Multi-GPU data analyzer #6182 #6202

Merged
merged 25 commits into from
Apr 11, 2023

Conversation

heyufan1995
Copy link
Member

@heyufan1995 heyufan1995 commented Mar 20, 2023

Fixes #6182 .
fixes #6114
Added multi-gpu support for data analyzer.
Tested on NGC using 4 16g V100 for total segmentator
Speed up:
4 GPU: data analysis 6.90 mins (9.2 mins including yaml export)
1 GPU: data analysis 22.98 mins (25.33 mins including yaml export)

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@myron
Copy link
Collaborator

myron commented Mar 20, 2023

Hi @heyufan1995 it seems you did your changes on top of an old commit (not the latest), because you changes remove the most recent logic of saving 2 files. Please double check , thanks.

@mingxin-zheng
Copy link
Contributor

The signature is missing so it's not passing the DCO. For reference: https://github.com/Project-MONAI/MONAI/blob/dev/CONTRIBUTING.md#signing-your-work

Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
@wyli
Copy link
Contributor

wyli commented Mar 31, 2023

/black

@wyli
Copy link
Contributor

wyli commented Mar 31, 2023

/build

@wyli
Copy link
Contributor

wyli commented Apr 2, 2023

/black
there are some testing errors
https://github.com/Project-MONAI/MONAI/actions/runs/4577850785/jobs/8083767762?pr=6202
for cpuonly tests and command python -m tests.test_auto3dseg.

@mingxin-zheng
Copy link
Contributor

I just checked the "cpuonly" tests is not using "cpu" as device after one PR that changes the default device. I have a PR to fix that.

@mingxin-zheng
Copy link
Contributor

By checking the error, it seems the cpu test fails because label is unreferenced. So it is not related to PR #6278 (but it will be good to include that fix too so that we can expose the bug earlier.)

Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
heyufan1995 and others added 5 commits April 3, 2023 11:20
Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
Signed-off-by: Wenqi Li <831580+wyli@users.noreply.github.com>
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@wyli
Copy link
Contributor

wyli commented Apr 5, 2023

/black

monai-bot and others added 2 commits April 5, 2023 19:27
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@wyli
Copy link
Contributor

wyli commented Apr 5, 2023

/build

Copy link
Contributor

@wyli wyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging this if the existing tests are ok. #6307 requires multi-gpu test environment which is under development.

@wyli
Copy link
Contributor

wyli commented Apr 5, 2023

some issues with

torch==1.13.1 torchvision==0.14.1

[2023-04-05T21:08:47.173Z] ======================================================================
[2023-04-05T21:08:47.173Z] ERROR: test_get_history (tests.test_auto3dseg_hpo.TestHPO)
[2023-04-05T21:08:47.173Z] ----------------------------------------------------------------------
[2023-04-05T21:08:47.173Z] Traceback (most recent call last):
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/tests/test_auto3dseg_hpo.py", line 133, in setUp
[2023-04-05T21:08:47.173Z]     bundle_generator.generate(work_dir, num_fold=1)
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/monai/apps/auto3dseg/bundle_gen.py", line 537, in generate
[2023-04-05T21:08:47.173Z]     gen_algo.export_to_disk(output_folder, name, fold=f_id)
[2023-04-05T21:08:47.173Z]   File "/tmp/tmpconkvfbv/workdir/algorithm_templates/segresnet/scripts/algo.py", line 250, in export_to_disk
[2023-04-05T21:08:47.173Z]     super().export_to_disk(output_path=output_path, algo_name=algo_name, **kwargs)
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/monai/apps/auto3dseg/bundle_gen.py", line 151, in export_to_disk
[2023-04-05T21:08:47.173Z]     self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
[2023-04-05T21:08:47.173Z]   File "/tmp/tmpconkvfbv/workdir/algorithm_templates/segresnet/scripts/algo.py", line 127, in fill_template_config
[2023-04-05T21:08:47.173Z]     max_epochs = int(np.clip(np.ceil(80000.0 / n_cases), a_min=300, a_max=1250))
[2023-04-05T21:08:47.173Z] ZeroDivisionError: float division by zero
[2023-04-05T21:08:47.173Z] 
[2023-04-05T21:08:47.173Z] ======================================================================
[2023-04-05T21:08:47.173Z] ERROR: test_run_algo (tests.test_auto3dseg_hpo.TestHPO)
[2023-04-05T21:08:47.173Z] ----------------------------------------------------------------------
[2023-04-05T21:08:47.173Z] Traceback (most recent call last):
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/tests/test_auto3dseg_hpo.py", line 133, in setUp
[2023-04-05T21:08:47.173Z]     bundle_generator.generate(work_dir, num_fold=1)
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/monai/apps/auto3dseg/bundle_gen.py", line 537, in generate
[2023-04-05T21:08:47.173Z]     gen_algo.export_to_disk(output_folder, name, fold=f_id)
[2023-04-05T21:08:47.173Z]   File "/tmp/tmpconkvfbv/workdir/algorithm_templates/segresnet/scripts/algo.py", line 250, in export_to_disk
[2023-04-05T21:08:47.173Z]     super().export_to_disk(output_path=output_path, algo_name=algo_name, **kwargs)
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/monai/apps/auto3dseg/bundle_gen.py", line 151, in export_to_disk
[2023-04-05T21:08:47.173Z]     self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
[2023-04-05T21:08:47.173Z]   File "/tmp/tmpconkvfbv/workdir/algorithm_templates/segresnet/scripts/algo.py", line 127, in fill_template_config
[2023-04-05T21:08:47.173Z]     max_epochs = int(np.clip(np.ceil(80000.0 / n_cases), a_min=300, a_max=1250))
[2023-04-05T21:08:47.173Z] ZeroDivisionError: float division by zero
[2023-04-05T21:08:47.173Z] 
[2023-04-05T21:08:47.173Z] ======================================================================
[2023-04-05T21:08:47.173Z] ERROR: test_run_optuna (tests.test_auto3dseg_hpo.TestHPO)
[2023-04-05T21:08:47.173Z] ----------------------------------------------------------------------
[2023-04-05T21:08:47.173Z] Traceback (most recent call last):
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/tests/test_auto3dseg_hpo.py", line 133, in setUp
[2023-04-05T21:08:47.173Z]     bundle_generator.generate(work_dir, num_fold=1)
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/monai/apps/auto3dseg/bundle_gen.py", line 537, in generate
[2023-04-05T21:08:47.173Z]     gen_algo.export_to_disk(output_folder, name, fold=f_id)
[2023-04-05T21:08:47.173Z]   File "/tmp/tmpconkvfbv/workdir/algorithm_templates/segresnet/scripts/algo.py", line 250, in export_to_disk
[2023-04-05T21:08:47.173Z]     super().export_to_disk(output_path=output_path, algo_name=algo_name, **kwargs)
[2023-04-05T21:08:47.173Z]   File "/home/jenkins/agent/workspace/MONAI-premerge/monai/monai/apps/auto3dseg/bundle_gen.py", line 151, in export_to_disk
[2023-04-05T21:08:47.173Z]     self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
[2023-04-05T21:08:47.173Z]   File "/tmp/tmpconkvfbv/workdir/algorithm_templates/segresnet/scripts/algo.py", line 127, in fill_template_config
[2023-04-05T21:08:47.173Z]     max_epochs = int(np.clip(np.ceil(80000.0 / n_cases), a_min=300, a_max=1250))
[2023-04-05T21:08:47.173Z] ZeroDivisionError: float division by zero
[2023-04-05T21:08:47.173Z] 
[2023-04-05T21:08:47.173Z] ----------------------------------------------------------------------

monai/apps/auto3dseg/data_analyzer.py Show resolved Hide resolved
monai/apps/auto3dseg/data_analyzer.py Outdated Show resolved Hide resolved
monai/apps/auto3dseg/data_analyzer.py Outdated Show resolved Hide resolved
heyufan1995 and others added 2 commits April 11, 2023 16:09
Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
@wyli
Copy link
Contributor

wyli commented Apr 11, 2023

/black

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@wyli
Copy link
Contributor

wyli commented Apr 11, 2023

/integration-test
/build

Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
@wyli
Copy link
Contributor

wyli commented Apr 11, 2023

/build

@wyli wyli enabled auto-merge (squash) April 11, 2023 22:36
@wyli wyli merged commit 6a7f35b into Project-MONAI:dev Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable multi-gpu data analyser [Auto3DSeg] Report/log failed cases in the process of data analysis
6 participants