Workflows error when trying to build torch-sys without venv, "torch module not found" #843

Simon-Bertrand · 2024-02-04T01:29:47Z

Hello,

Thank you for this library and my respects for your school and your projects.

I have no problem to build libraries on Windows or Linux on a local environment. But I have a problem with the GitHub Workflows environment when I build torch-sys 0.14.0.

The strange error I get is the one on line 231 (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L231C40-L231C71) "no cxx11 abi returned by python". Setting GLIBCXX_USE_CXX11 to one or zero did not change anything.

It's caused by the fact that during the env_var_rerun("LIBTORCH_USE_PYTORCH").is_ok() test, a local variable "cxx11_abi" is created and remains in the None state despite parsing the return of the "PYTHON_PRINT_PYTORCH_DETAILS" command on the Python side.

Because the env. variable "LIBTORCH_USE_PYTORCH" is correctly setted to 1, it seems that the output variable on line 207 (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L207) fails with an error code : " Error: no cxx11 abi returned by python Output { status: ExitStatus(unix_wait_status(256)), stdout: "", stderr: "Traceback (most recent call last):\n File \"<string>\", line 2, in <module>\nModuleNotFoundError: No module named 'torch'\n" }". So that variable "cxx11_abi" (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L212C21-L212C30) stays at the None state.

The part ModuleNotFoundError: No module named 'torch' in the error stderr is intriguating me. So I questionned myself, Is the good Python interpreter used for the "PYTHON_PRINT_PYTORCH_DETAILS" command ? The question brought me to line 182 (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L182), where python is chosen if env. var. "VIRTUAL_ENV" exists. Else, the python3 is chosen. In the Github Workflows env, I do not use a virtual venv, so python interpreter should point to "python3". (The workflow uses Linux)

So I tried to install pytorch on both Python executables, meaning I did at the first steps of my workflow to debug : pip3 install torch==2.1.0 and pip install torch==2.1.0. If I echo python -c " import torch from torch.utils import cpp_extension print('LIBTORCH_VERSION:', torch.__version__.split('+')[0]) print('LIBTORCH_CXX11:', torch._C._GLIBCXX_USE_CXX11_ABI) for include_path in cpp_extension.include_paths(): print('LIBTORCH_INCLUDE:', include_path) for library_path in cpp_extension.library_paths(): print('LIBTORCH_LIB:', library_path) or the same command using python3, I successfully echo a `LIBTORCH_CXX11: : False' line which should be correctly parsed by the build.rs. Normaly, this match (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L219) should set "cxx11_abi" to Some("0"). But it is not the case, as I got the error "no cxx11 abi returned by python Output" so that "cxx11_abi" is still None. No changes are observed if I set env. var. "VIRTUAL_ENV" to a random value, 1 for example.

I concluded, for a reason I ignore, that the Python executed command "PYTHON_PRINT_PYTORCH_DETAILS" fails on the Rust side to find the right Python interpreter with "torch" lib installed on it or that I fail to install pytorch on the good interpreter used by the Rust code which executes the Python command.

I provide the workflow logs : https://github.com/Simon-Bertrand/rust-python-bindings/actions/runs/7770351359/job/21190435480. Where at (https://github.com/Simon-Bertrand/rust-python-bindings/actions/runs/7770351359/job/21190435480#step:5:118) you can see the "python" command sucessfully return LIBTORCH_CXX11: False and the same for "python3" on line (https://github.com/Simon-Bertrand/rust-python-bindings/actions/runs/7770351359/job/21190435480#step:5:150) after installation of torch==2.1.0 library with both executables.

Do you know If I am completely wrong/lost or if it is actually related to a GLIBCXX_USE_CXX11 error ?

Thank you by advance,
BS

The text was updated successfully, but these errors were encountered:

LaurentMazare · 2024-02-04T07:21:12Z

Hello,
I think you're right and this is not related to the cxx11 bit but rather that the python interpreter started in the tch-sys build process is unable to find the torch module. I wouldn't have much of a clue why, in order to debug this I would start by modifying the build.rs script (in a fork/...) so as to print both the sys.executable and sys.path before including torch. This should give a better idea of which python executable actually gets executed and where it tries to retrieves its libraries from.

Simon-Bertrand · 2024-02-04T11:50:17Z

Hi Laurent,

Thank you for the fast answer.

I think the error is related to setuptools-rust and my pyproject.toml config, as I get no errors when I only use cargo build directly and not pip install . to build my template lib. I did what you adviced me and found that :

When succeed with cargo build :

SYS.EXE : /opt/hostedtoolcache/Python/3.10.13/x64/bin/python
SYS.PATH :  ['', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python310.zip', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/lib-dynload', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages']

When failed with pip install . :

SYS.EXE: /opt/hostedtoolcache/Python/3.10.13/x64/bin/python
SYS.PATH: ['', '/tmp/pip-build-env-r4m__94o/site', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python310.zip', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/lib-dynload', '/tmp/pip-build-env-r4m__94o/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-r4m__94o/normal/lib/python3.10/site-packages']

The only difference I see is on the sys.path where an ugly /tmp/pip-build-env-*/site is added. I discovered this peps:build-environment which means that torch is not available during the build time as a build dependency. Something obvious that I could avoid to forget.

I added torch==2.2.0 to the [build-system] of my pyproject.toml such as :

[build-system]
requires = ["setuptools", "setuptools-rust", "torch==2.2.0"]

And now all is well. Sounds not related to Workflows but more related to minimal env.

Are you open to a PR to add a bit more of verbosity during the build time of torch-sys, so that if a "No module error" is found in the stderr field of the command output, a message will be displayed to ask users if they correctly setup build dependencies with torch on it ?

Thank you for your time

Simon-Bertrand added a commit to Simon-Bertrand/tch-rs-workflows-fix that referenced this issue Feb 4, 2024

fix(torch-sys:build.rs): cleanup useless spaces. fixes LaurentMazare#843

1ba6eeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflows error when trying to build torch-sys without venv, "torch module not found" #843

Workflows error when trying to build torch-sys without venv, "torch module not found" #843

Simon-Bertrand commented Feb 4, 2024 •

edited

Loading

LaurentMazare commented Feb 4, 2024

Simon-Bertrand commented Feb 4, 2024 •

edited

Loading

Workflows error when trying to build torch-sys without venv, "torch module not found" #843

Workflows error when trying to build torch-sys without venv, "torch module not found" #843

Comments

Simon-Bertrand commented Feb 4, 2024 • edited Loading

LaurentMazare commented Feb 4, 2024

Simon-Bertrand commented Feb 4, 2024 • edited Loading

Simon-Bertrand commented Feb 4, 2024 •

edited

Loading

Simon-Bertrand commented Feb 4, 2024 •

edited

Loading