Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cuda defined in train_params bug #6370

Merged
merged 5 commits into from
Apr 15, 2023
Merged

Conversation

heyufan1995
Copy link
Member

Fixes # .
If user defined CUDA_VISIBLE_DEVICES in train_params, bundleAlgo will put that into cmd and cause error.
Pop this out before cmd and throw out a warning

Description

A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
Signed-off-by: heyufan1995 <heyufan1995@gmail.com>
@mingxin-zheng mingxin-zheng requested a review from wyli April 15, 2023 03:58
@wyli
Copy link
Contributor

wyli commented Apr 15, 2023

/integration-test

@wyli
Copy link
Contributor

wyli commented Apr 15, 2023

/build

wyli and others added 2 commits April 15, 2023 04:06
Signed-off-by: Wenqi Li <wenqil@nvidia.com>
@wyli
Copy link
Contributor

wyli commented Apr 15, 2023

this still doesn't work, with the error:

2023-04-15 08:10:35,481 - INFO - Launching: OMP_NUM_THREADS=1 python /tmp/tmp5ibdlrdn/workdir/dints_0/scripts/train.py run --config_file='/tmp/tmp5ibdlrdn/workdir/dints_0/configs/network_search.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/transforms_validate.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/network.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/transforms_train.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/hyper_parameters.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/hyper_parameters_search.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/transforms_infer.yaml' --training#num_images_per_batch=2 --training#num_epochs=2 --training#num_epochs_per_validation=1
algo_templates.tar.gz: 72.0kB [00:00, 153kB/s]                             Traceback (most recent call last):
  File "/tmp/tmp5ibdlrdn/workdir/dints_0/scripts/train.py", line 36, in <module>
    from apex.contrib.clip_grad import clip_grad_norm_
  File "/opt/conda/lib/python3.8/site-packages/apex/__init__.py", line 10, in <module>
    from . import amp
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/__init__.py", line 1, in <module>
    from .amp import init, half_function, float_function, promote_function,\
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/amp.py", line 5, in <module>
    from .frontend import *
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/frontend.py", line 2, in <module>
    from ._initialize import _initialize
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/_initialize.py", line 2, in <module>
    from torch._six import string_classes
ModuleNotFoundError: No module named 'torch._six'

https://github.com/Project-MONAI/MONAI/actions/runs/4706795068/jobs/8348263719

@wyli
Copy link
Contributor

wyli commented Apr 15, 2023

/build

@wyli wyli merged commit b356fec into Project-MONAI:dev Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants