Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed the last time, succeeded the next time?上一次还失败,下一次就成功了? #1856

Open
zhangs-a-n opened this issue Nov 1, 2024 · 2 comments

Comments

@zhangs-a-n
Copy link

zhangs-a-n commented Nov 1, 2024

我执行的是下面这条命令:
I executed the following command:

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

第一次运行时,显示失败了,失败的原因是:

The first time I run it, the display fails because:

 RuntimeError: Error compiling objects for extension
  error: subprocess-exited-with-error

  × Building wheel for apex (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/zsf/anaconda3/envs/pyt231py312_2_linux/bin/python3.12 /home/zsf/anaconda3/envs/pyt231py312_2_linux/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp2ijgo8p4
  cwd: /home/zsf/anaconda3/envs/pyt231py312_2_linux/apex
  Building wheel for apex (pyproject.toml) ... error
  ERROR: Failed building wheel for apex
Failed to build apex
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (apex)

其实这次还是有进步的,之前运行的那些乱七八糟的pip install命令,有的报TypeError: str,有的报No module Named torch(可我明明已经安装了pytorch了啊)。
However, this time there is an improvement. The previous pip install command was causing a mess of TypeError and No module Named torch(even though I already have pytorch installed).

第二遍时,我嫌显示的信息太多,就把-v项去了,然后等了好几分钟(10 mins?),就显示成功了,真是太扯了。
The second time, I thought there was too much information to display, so I removed the -v item, and then I waited a few minutes(10 mins?), and the display was successful.

Processing /home/zsf/anaconda3/envs/pyt231py312_2_linux/apex
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: packaging>20.6 in /home/zsf/anaconda3/envs/pyt231py312_2_linux/lib/python3.12/site-packages (from apex==0.1) (24.1)
Building wheels for collected packages: apex
  Building wheel for apex (pyproject.toml) ... done
  Created wheel for apex: filename=apex-0.1-cp312-cp312-linux_x86_64.whl size=4844829 sha256=5256a4aa59e969e609ca1ba25f616b68607eac921bde36fbff1c063a4515a570
  Stored in directory: /tmp/pip-ephem-wheel-cache-milgfajo/wheels/45/ef/09/6cfbe9deb98dfb0c3024c7fb91f389935bccbff826387be8f2
Successfully built apex
Installing collected packages: apex
Successfully installed apex-0.1
@zhangs-a-n zhangs-a-n changed the title 上一次还失败,下一次就成功了? Failed the last time, succeeded the next time?上一次还失败,下一次就成功了? Nov 1, 2024
@zhangs-a-n
Copy link
Author

我在conda虚拟环境中安装apex。
我使用的命令是:
I installed apex in the conda virtual environment.
The command I used was:

pip install --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

虚拟环境使用的是pytorch2.3.1,cuda_version:12.1。
The virtual environment is pytorch2.3.1, cuda_version:12.1.
然后使用的系统是Ubuntu22.04LTS。
The system used is Ubuntu22.04LTS.

安装apex时,如果指定了--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext",就需要安装gcc和对应虚拟环境cuda版本的cudatoolkit。cudatoolkit是安装在系统上的,不是安装在虚拟环境中。

When installing apex, if you specify --config-settings "--build-option=--cpp_ext" and --config-settings "--build-option=--cuda_ext", You need to install gcc and the corresponding virtual environment cuda version of cudatoolkit. cudatoolkit is installed on the system, not in a virtual environment.

关于cudatoolkit的安装,https://developer.nvidia.com/cuda-toolkit-archive, 一定要安装与虚拟环境cuda版本对应的cudatoolkit
Installation of cudatoolkit https://developer.nvidia.com/cuda-toolkit-archive, virtual environment cuda version must be installed with the corresponding cudatoolkit.

下面是安装的cudatoolkit版本与虚拟环境中cuda版本不一致时会报的错误:
Here are the errors that will occur when the version of cudatoolkit installed does not match the version of cuda in the virtual environment:

- [RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 11.3.
      In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).

@AlongWY
Copy link

AlongWY commented Nov 20, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants