Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doesn't support CUDA >=11.0 #4

Open
amiltonwong opened this issue Jul 22, 2021 · 4 comments
Open

doesn't support CUDA >=11.0 #4

amiltonwong opened this issue Jul 22, 2021 · 4 comments

Comments

@amiltonwong
Copy link

Hi, @paul007pl ,

According to the setup.sh, CUDA 10.1 is used. However, my GPU (RTX 30xx series) only support CUDA >=11.0. And the error output after inputing command python train.py -c ./cfgs/pcn.yaml is displayed as follows, it related to the CUDA version problem.

root@milton-LabPC:/data/code13/MVP_Benchmark/completion# python train.py -c ./cfgs/pcn.yaml
INFO:root:Munch({'batch_size': 32, 'workers': 0, 'nepoch': 100, 'model_name': 'pcn', 'load_model': None, 'start_epoch': 0, 'num_points': 2048, 'work_dir': 'log/', 'flag': 'debug', 'loss': 'cd', 'manual_seed': None, 'use_mean_feature': False, 'step_interval_to_print': 500, 'epoch_interval_to_save': 1, 'epoch_interval_to_val': 1, 'varying_constant': '0.01, 0.1, 0.5, 1', 'varying_constant_epochs': '5, 15, 30', 'lr': 0.0001, 'lr_decay': True, 'lr_decay_interval': 40, 'lr_decay_rate': 0.7, 'lr_step_decay_epochs': None, 'lr_step_decay_rates': None, 'lr_clip': 1e-06, 'optimizer': 'Adam', 'weight_decay': 0, 'betas': '0.9, 0.999', 'save_vis': True, 'eval_emd': False})
(62400, 2048, 3)
(2400, 2048, 3) (62400,)
(41600, 2048, 3)
(1600, 2048, 3) (41600,)
INFO:root:Length of train dataset:62400
INFO:root:Length of test dataset:41600
INFO:root:Random Seed: 6693
Jitting Chamfer 3D
Traceback (most recent call last):
  File "train.py", line 213, in <module>
    train()
  File "train.py", line 48, in train
    model_module = importlib.import_module('.%s' % args.model_name, 'models')
  File "/root/anaconda3/envs/pytorch1.5_4d_pls/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/root/mdata/data/code13/MVP_Benchmark/completion/models/pcn.py", line 10, in <module>
    from model_utils import gen_grid_up, calc_emd, calc_cd
  File "/media/root/mdata/data/code13/MVP_Benchmark/completion/model_utils.py", line 20, in <module>
    from metrics import cd, fscore, emd
  File "../utils/metrics/__init__.py", line 1, in <module>
    from .CD import (cd, fscore)
  File "../utils/metrics/CD/__init__.py", line 1, in <module>
    from .chamfer3D.dist_chamfer_3D import chamfer_3DDist as cd
  File "../utils/metrics/CD/chamfer3D/dist_chamfer_3D.py", line 15, in <module>
    "/".join(os.path.abspath(__file__).split('/')[:-1] + ["chamfer3D.cu"]),
  File "/root/anaconda3/envs/pytorch1.5_4d_pls/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 898, in load
    is_python_module)
  File "/root/anaconda3/envs/pytorch1.5_4d_pls/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1086, in _jit_compile
    with_cuda=with_cuda)
  File "/root/anaconda3/envs/pytorch1.5_4d_pls/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1179, in _write_ninja_file_and_build_library
    with_cuda=with_cuda)
  File "/root/anaconda3/envs/pytorch1.5_4d_pls/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_to_build_library
    cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
  File "/root/anaconda3/envs/pytorch1.5_4d_pls/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1316, in _get_cuda_arch_flags
    raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (8.6) or GPU not supported

Any suggestions to fix this issue?

Thanks~

@MaxChanger
Copy link

MaxChanger commented Jul 24, 2021

I think the fastest way is to update the version of pytorch, when CUDA>=11.0, pytorch>=1.7.x is the most suitable version (according to the official website lower version should not be supported to install), whether the accuracy receives the impact needs to be tested.

@amiltonwong
Copy link
Author

Hi, @MaxChanger

Then I use pytorch 1.7.0 version. However, I got the following error:

(pytorch1.7.0) root@milton-LabPC:/media/root/mdata/data/code13/MVP_Benchmark/completion# python train.py -c ./cfgs/pcn.yaml
INFO:root:Munch({'batch_size': 32, 'workers': 0, 'nepoch': 100, 'model_name': 'pcn', 'load_model': None, 'start_epoch': 0, 'num_points': 2048, 'work_dir': 'log/', 'flag': 'debug', 'loss': 'cd', 'manual_seed': None, 'use_mean_feature': False, 'step_interval_to_print': 500, 'epoch_interval_to_save': 1, 'epoch_interval_to_val': 1, 'varying_constant': '0.01, 0.1, 0.5, 1', 'varying_constant_epochs': '5, 15, 30', 'lr': 0.0001, 'lr_decay': True, 'lr_decay_interval': 40, 'lr_decay_rate': 0.7, 'lr_step_decay_epochs': None, 'lr_step_decay_rates': None, 'lr_clip': 1e-06, 'optimizer': 'Adam', 'weight_decay': 0, 'betas': '0.9, 0.999', 'save_vis': True, 'eval_emd': False})
(62400, 2048, 3)
(2400, 2048, 3) (62400,)
(41600, 2048, 3)
(1600, 2048, 3) (41600,)
INFO:root:Length of train dataset:62400
INFO:root:Length of test dataset:41600
INFO:root:Random Seed: 3648
Jitting Chamfer 3D
Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1516, in _run_ninja_build
    subprocess.run(
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 213, in <module>
    train()
  File "train.py", line 48, in train
    model_module = importlib.import_module('.%s' % args.model_name, 'models')
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/root/mdata/data/code13/MVP_Benchmark/completion/models/pcn.py", line 10, in <module>
    from model_utils import gen_grid_up, calc_emd, calc_cd
  File "/media/root/mdata/data/code13/MVP_Benchmark/completion/model_utils.py", line 20, in <module>
    from metrics import cd, fscore, emd
  File "../utils/metrics/__init__.py", line 1, in <module>
    from .CD import (cd, fscore)
  File "../utils/metrics/CD/__init__.py", line 1, in <module>
    from .chamfer3D.dist_chamfer_3D import chamfer_3DDist as cd
  File "../utils/metrics/CD/chamfer3D/dist_chamfer_3D.py", line 12, in <module>
    chamfer_3D = load(name="chamfer_3D",
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 969, in load
    return _jit_compile(
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1176, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1280, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1538, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'chamfer_3D': [1/2] /usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=chamfer_3D -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /root/anaconda3/envs/pytorch1.7.0/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /media/root/mdata/data/code13/MVP_Benchmark/utils/metrics/CD/chamfer3D/chamfer3D.cu -o chamfer3D.cuda.o 
FAILED: chamfer3D.cuda.o 
/usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=chamfer_3D -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /root/anaconda3/envs/pytorch1.7.0/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /media/root/mdata/data/code13/MVP_Benchmark/utils/metrics/CD/chamfer3D/chamfer3D.cu -o chamfer3D.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
ninja: build stopped: subcommand failed.

(pytorch1.7.0) root@milton-LabPC:/media/root/mdata/data/code13/MVP_Benchmark/completion

Any hints to fix this issue? Thanks~

@MaxChanger
Copy link

Hello, I'm not sure if you used 3080 or 3090.
I search and found some feasible solutions, DeepSpeed/issues/607, pytorch/issues/45021, pytorch/issues/45028.
In short, maybe the current CUDA computing power is higher than the CUDA computing power supported by the current PyTorch version.
Hope these can help you.

@EMCP
Copy link

EMCP commented Sep 1, 2021

you need to be on the latest 1.9+ pytorch build.. anything older for me doesn't run on my 3090 RTX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants