Google Colab setup env hit Cuda/extension version mismatch issue #53

marvin-0042 · 2024-02-27T17:41:13Z

Thank you so much for the great work!!!

I'm trying to setup the environment in Google Colab to train. but hit Cuda extension version mismatch issue. My python/pytorch/cuda version matches the requirement. Does anyone happen to know why? Really appreciated !!

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ --global-option="--cuda_ext" --global-option="--cpp_ext"

but hit below issue

RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.

my env:
python: 3.10
pytorch: 2.1.0
cuda: 12.1

full log:

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ --global-option="--cuda_ext" --global-option="--cpp_ext"
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
DEPRECATION: --build-option and --global-option are deprecated. pip 23.3 will enforce this behaviour change. A possible replacement is to use --config-settings. Discussion can be found at pypa/pip#11859
WARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option.
Processing /content/ColossalAI/OpenDiT/apex
Running command Preparing metadata (pyproject.toml)

torch.version = 2.1.0+cu121

running dist_info
creating /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info
writing /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-modern-metadata-6nsd1o2v/apex.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-6nsd1o2v/apex-0.1.dist-info'
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: packaging>20.6 in /usr/local/lib/python3.10/dist-packages (from apex==0.1) (23.2)
Building wheels for collected packages: apex
WARNING: Ignoring --global-option when building apex using PEP 517
Running command Building wheel for apex (pyproject.toml)

torch.version = 2.1.0+cu121

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
from /usr/local/cuda/bin

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 416, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
self.run_setup()
File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 178, in
File "", line 40, in check_cuda_torch_binary_vs_bare_metal
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
In some cases, a minor-version mismatch will not cause later errors: NVIDIA/apex#323 (comment). You can try commenting out this check (at your own risk).
error: subprocess-exited-with-error

× Building wheel for apex (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp6d28a180
cwd: /content/ColossalAI/OpenDiT/apex
Building wheel for apex (pyproject.toml) ... error
ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects

KKZ20 · 2024-02-28T01:47:36Z

Hi, thanks for supporting our work!

It seems that your CUDA version mismatches with the apex version. Do you use a virtual Python environment? If not, maybe you can check the native CUDA version to see if it meets the requirements of apex. Maybe you can try to install apex by directly executing pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ without checking out to commit 741bdf50825a97664db08574981962d66436d16a. You can also check apex's repo for more instructions on apex installation.

Feel free to ask if you have further questions!

oahzxl · 2024-02-28T03:52:07Z

it seems that the pytorch cuda version does not match your system cuda version. the easy way to fix it is to install a new pytorch that aligns with your system cuda version

KKZ20 added the question Further information is requested label Feb 28, 2024

oahzxl closed this as completed Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Colab setup env hit Cuda/extension version mismatch issue #53

Google Colab setup env hit Cuda/extension version mismatch issue #53

marvin-0042 commented Feb 27, 2024

KKZ20 commented Feb 28, 2024 •

edited

Loading

oahzxl commented Feb 28, 2024

Google Colab setup env hit Cuda/extension version mismatch issue #53

Google Colab setup env hit Cuda/extension version mismatch issue #53

Comments

marvin-0042 commented Feb 27, 2024

KKZ20 commented Feb 28, 2024 • edited Loading

oahzxl commented Feb 28, 2024

KKZ20 commented Feb 28, 2024 •

edited

Loading