You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to setup the environment in Google Colab to train. but hit Cuda extension version mismatch issue. My python/pytorch/cuda version matches the requirement. Does anyone happen to know why? Really appreciated !!
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
my env:
python: 3.10
pytorch: 2.1.0
cuda: 12.1
full log:
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ --global-option="--cuda_ext" --global-option="--cpp_ext"
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
DEPRECATION: --build-option and --global-option are deprecated. pip 23.3 will enforce this behaviour change. A possible replacement is to use --config-settings. Discussion can be found at pypa/pip#11859
WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option.
Processing /content/ColossalAI/OpenDiT/apex
Running command Preparing metadata (pyproject.toml)
torch.version = 2.1.0+cu121
running dist_info
creating /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info
writing /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-6nsd1o2v/apex-0.1.dist-info'
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: packaging>20.6 in /usr/local/lib/python3.10/dist-packages (from apex==0.1) (23.2)
Building wheels for collected packages: apex
WARNING: Ignoring --global-option when building apex using PEP 517
Running command Building wheel for apex (pyproject.toml)
torch.version = 2.1.0+cu121
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
from /usr/local/cuda/bin
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 416, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
self.run_setup()
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 178, in
File "", line 40, in check_cuda_torch_binary_vs_bare_metal
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
In some cases, a minor-version mismatch will not cause later errors: NVIDIA/apex#323 (comment). You can try commenting out this check (at your own risk).
error: subprocess-exited-with-error
× Building wheel for apex (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp6d28a180
cwd: /content/ColossalAI/OpenDiT/apex
Building wheel for apex (pyproject.toml) ... error
ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects
The text was updated successfully, but these errors were encountered:
It seems that your CUDA version mismatches with the apex version. Do you use a virtual Python environment? If not, maybe you can check the native CUDA version to see if it meets the requirements of apex. Maybe you can try to install apex by directly executing pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ without checking out to commit 741bdf50825a97664db08574981962d66436d16a. You can also check apex's repo for more instructions on apex installation.
it seems that the pytorch cuda version does not match your system cuda version. the easy way to fix it is to install a new pytorch that aligns with your system cuda version
Thank you so much for the great work!!!
I'm trying to setup the environment in Google Colab to train. but hit Cuda extension version mismatch issue. My python/pytorch/cuda version matches the requirement. Does anyone happen to know why? Really appreciated !!
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ --global-option="--cuda_ext" --global-option="--cpp_ext"
but hit below issue
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
my env:
python: 3.10
pytorch: 2.1.0
cuda: 12.1
full log:
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ --global-option="--cuda_ext" --global-option="--cpp_ext"
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
DEPRECATION: --build-option and --global-option are deprecated. pip 23.3 will enforce this behaviour change. A possible replacement is to use --config-settings. Discussion can be found at pypa/pip#11859
WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option.
Processing /content/ColossalAI/OpenDiT/apex
Running command Preparing metadata (pyproject.toml)
torch.version = 2.1.0+cu121
running dist_info
creating /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info
writing /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-6nsd1o2v/apex-0.1.dist-info'
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: packaging>20.6 in /usr/local/lib/python3.10/dist-packages (from apex==0.1) (23.2)
Building wheels for collected packages: apex
WARNING: Ignoring --global-option when building apex using PEP 517
Running command Building wheel for apex (pyproject.toml)
torch.version = 2.1.0+cu121
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
from /usr/local/cuda/bin
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 416, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
self.run_setup()
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 178, in
File "", line 40, in check_cuda_torch_binary_vs_bare_metal
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
In some cases, a minor-version mismatch will not cause later errors: NVIDIA/apex#323 (comment). You can try commenting out this check (at your own risk).
error: subprocess-exited-with-error
× Building wheel for apex (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp6d28a180
cwd: /content/ColossalAI/OpenDiT/apex
Building wheel for apex (pyproject.toml) ... error
ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects
The text was updated successfully, but these errors were encountered: