Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflows error when trying to build torch-sys without venv, "torch module not found" #843

Open
Simon-Bertrand opened this issue Feb 4, 2024 · 2 comments

Comments

@Simon-Bertrand
Copy link

Simon-Bertrand commented Feb 4, 2024

Hello,

Thank you for this library and my respects for your school and your projects.

I have no problem to build libraries on Windows or Linux on a local environment. But I have a problem with the GitHub Workflows environment when I build torch-sys 0.14.0.

The strange error I get is the one on line 231 (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L231C40-L231C71) "no cxx11 abi returned by python". Setting GLIBCXX_USE_CXX11 to one or zero did not change anything.

It's caused by the fact that during the env_var_rerun("LIBTORCH_USE_PYTORCH").is_ok() test, a local variable "cxx11_abi" is created and remains in the None state despite parsing the return of the "PYTHON_PRINT_PYTORCH_DETAILS" command on the Python side.

Because the env. variable "LIBTORCH_USE_PYTORCH" is correctly setted to 1, it seems that the output variable on line 207 (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L207) fails with an error code : " Error: no cxx11 abi returned by python Output { status: ExitStatus(unix_wait_status(256)), stdout: "", stderr: "Traceback (most recent call last):\n File \"<string>\", line 2, in <module>\nModuleNotFoundError: No module named 'torch'\n" }". So that variable "cxx11_abi" (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L212C21-L212C30) stays at the None state.

The part ModuleNotFoundError: No module named 'torch' in the error stderr is intriguating me. So I questionned myself, Is the good Python interpreter used for the "PYTHON_PRINT_PYTORCH_DETAILS" command ? The question brought me to line 182 (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L182), where python is chosen if env. var. "VIRTUAL_ENV" exists. Else, the python3 is chosen. In the Github Workflows env, I do not use a virtual venv, so python interpreter should point to "python3". (The workflow uses Linux)

So I tried to install pytorch on both Python executables, meaning I did at the first steps of my workflow to debug : pip3 install torch==2.1.0 and pip install torch==2.1.0. If I echo python -c " import torch from torch.utils import cpp_extension print('LIBTORCH_VERSION:', torch.__version__.split('+')[0]) print('LIBTORCH_CXX11:', torch._C._GLIBCXX_USE_CXX11_ABI) for include_path in cpp_extension.include_paths(): print('LIBTORCH_INCLUDE:', include_path) for library_path in cpp_extension.library_paths(): print('LIBTORCH_LIB:', library_path) or the same command using python3, I successfully echo a `LIBTORCH_CXX11: : False' line which should be correctly parsed by the build.rs. Normaly, this match (https://github.com/LaurentMazare/tch-rs/blob/main/torch-sys/build.rs#L219) should set "cxx11_abi" to Some("0"). But it is not the case, as I got the error "no cxx11 abi returned by python Output" so that "cxx11_abi" is still None. No changes are observed if I set env. var. "VIRTUAL_ENV" to a random value, 1 for example.

I concluded, for a reason I ignore, that the Python executed command "PYTHON_PRINT_PYTORCH_DETAILS" fails on the Rust side to find the right Python interpreter with "torch" lib installed on it or that I fail to install pytorch on the good interpreter used by the Rust code which executes the Python command.

I provide the workflow logs : https://github.com/Simon-Bertrand/rust-python-bindings/actions/runs/7770351359/job/21190435480. Where at (https://github.com/Simon-Bertrand/rust-python-bindings/actions/runs/7770351359/job/21190435480#step:5:118) you can see the "python" command sucessfully return LIBTORCH_CXX11: False and the same for "python3" on line (https://github.com/Simon-Bertrand/rust-python-bindings/actions/runs/7770351359/job/21190435480#step:5:150) after installation of torch==2.1.0 library with both executables.

Do you know If I am completely wrong/lost or if it is actually related to a GLIBCXX_USE_CXX11 error ?

Thank you by advance,
BS

@LaurentMazare
Copy link
Owner

Hello,
I think you're right and this is not related to the cxx11 bit but rather that the python interpreter started in the tch-sys build process is unable to find the torch module. I wouldn't have much of a clue why, in order to debug this I would start by modifying the build.rs script (in a fork/...) so as to print both the sys.executable and sys.path before including torch. This should give a better idea of which python executable actually gets executed and where it tries to retrieves its libraries from.

@Simon-Bertrand
Copy link
Author

Simon-Bertrand commented Feb 4, 2024

Hi Laurent,

Thank you for the fast answer.

I think the error is related to setuptools-rust and my pyproject.toml config, as I get no errors when I only use cargo build directly and not pip install . to build my template lib. I did what you adviced me and found that :

When succeed with cargo build :

SYS.EXE : /opt/hostedtoolcache/Python/3.10.13/x64/bin/python
SYS.PATH :  ['', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python310.zip', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/lib-dynload', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages']

When failed with pip install . :

SYS.EXE: /opt/hostedtoolcache/Python/3.10.13/x64/bin/python
SYS.PATH: ['', '/tmp/pip-build-env-r4m__94o/site', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python310.zip', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10', '/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/lib-dynload', '/tmp/pip-build-env-r4m__94o/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-r4m__94o/normal/lib/python3.10/site-packages']

The only difference I see is on the sys.path where an ugly /tmp/pip-build-env-*/site is added. I discovered this peps:build-environment which means that torch is not available during the build time as a build dependency. Something obvious that I could avoid to forget.

I added torch==2.2.0 to the [build-system] of my pyproject.toml such as :

[build-system]
requires = ["setuptools", "setuptools-rust", "torch==2.2.0"]

And now all is well. Sounds not related to Workflows but more related to minimal env.

Are you open to a PR to add a bit more of verbosity during the build time of torch-sys, so that if a "No module error" is found in the stderr field of the command output, a message will be displayed to ask users if they correctly setup build dependencies with torch on it ?

Thank you for your time

Simon-Bertrand added a commit to Simon-Bertrand/tch-rs-workflows-fix that referenced this issue Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants