Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPL cannot load libpython when built and run under Python environment #432

Open
vsbogd opened this issue Sep 13, 2023 · 20 comments
Open

REPL cannot load libpython when built and run under Python environment #432

vsbogd opened this issue Sep 13, 2023 · 20 comments
Assignees
Labels
bug Something isn't working repl

Comments

@vsbogd
Copy link
Collaborator

vsbogd commented Sep 13, 2023

I use Conda environment to manage Python versions and dependencies. When I build MeTTa REPL under Conda and try to run it after the build I see the following error:

$ ./target/debug/metta 
./target/debug/metta: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

PyO3 compile against and uses absolutely correct version of the Python library. I don't see any inconsistency here.

According to PyO3/pyo3#1576 when using PyO3 app under Python env one need to add the path to the Python library into LD_LIBRARY_PATH. This kind of works but breaks other executables because lib directory in Conda env contains other libraries which overrides system library loading paths.

Steps to reproduce (Conda is expected to be installed):

conda create -n test python=3.7
conda activate test
cd repl
cargo run --features python
@luketpeterson
Copy link
Contributor

When the repl is built, it looks like we can point pyo3 at the python version we want to use. https://pyo3.rs/v0.19.2/building_and_distribution.html#configuring-the-python-version

So I'll see what's involved in querying Conda, and potentially overriding pyo3's defaults if Conda is installed. The other option (which we might want to do regardless) is to document the PYO3_PYTHON environment variable in our docs.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 13, 2023

I didn't mention it in an issue description, but in fact PyO3 compile against and uses absolutely correct version of the library. So I don't see any inconsistency. The issue is that this shared Python library from Conda is not found by OS when program is executed under the same environment. It sounds more like a Conda issue btw.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 13, 2023

This is how Conda manages it for its packages:

On Linux, conda-build modifies any shared libraries or generated executables to use a relative dynamic link by calling the patchelf tool. On macOS, the install_name_tool tool is used.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 13, 2023

I would say the proper way to fix is to have two different dynamic links for two different cases:

  • when executable is being built under Python virtual env (library is not visible to the system loader) we should use absolute link to the Python library inside executable
  • when executable is being built against system Python library (library is visible to the system loader) we use just the name of the library without path as a dynamic loading link.

@luketpeterson
Copy link
Contributor

luketpeterson commented Sep 13, 2023

This is how Conda manages it for its packages:

On Linux, conda-build modifies any shared libraries or generated executables to use a relative dynamic link by calling the patchelf tool. On macOS, the install_name_tool tool is used.

I see. It seems like installing the REPL as part of a package under Conda's control would be the "right" fix from Conda's point of view.

I don't know anything about how Conda and pip interact, but would installing the REPL as part of the current Python package accomplish this?

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 13, 2023

would installing the REPL as part of the current Python package accomplish this?

I guess it is not a common use-case to share native executable as a part of a Python package. Thus I don't think it will work actually. At the link above I believe they are talking about a situation when a Conda package is being built. And to me it is an open question how to distribute the app which depends on Python. Probably like any other binary app.

For distribution purposes we should use the version without specific path to the Python library. Two cases above which are proposed in #432 (comment) are for the local builds only.

Btw it makes sense to switch between Python and non-Python in runtime instead of build time. Thus user can install MeTTa and use it with Python only when it is required.

@luketpeterson
Copy link
Contributor

luketpeterson commented Sep 13, 2023

For distribution purposes we should use the version without specific path to the Python library. Two cases above which are proposed in #432 (comment) are for the local builds only.

If you run patchelf yourself on the built metta repl binary, can you confirm this allows it to find the right Python shared library without setting the LD_LIBRARY_PATH variable?

Btw it makes sense to switch between Python and non-Python in runtime instead of build time. Thus user can install MeTTa and use it with Python only when it is required

This is actually how I had it originally, but it was a lot of complexity and I didn't see the benefit so I removed it.

There are two issues to doing this the "clean" way.
1.) the python shared lib dependency is a "Strong" link, so the OS won't even run our main unless the link can be resolved. To fix this we'd either need a way to make the dependency weak (I don't see a feature for that so I might need to submit a PR to pyo3. But I'd ask around first update, it might be possible by hijacking the extension-module feature) The ugly hack-around is to have our own loader binary that springboards into another executable.
2.) We'd need to load the MeTTa libraries appropriate for the language we'll be interoperating with. This will get easier when one core stdlib can interoperate with any language, but right now we need to know in advance.

@luketpeterson
Copy link
Contributor

A related but different issue is here: PyO3/pyo3#1554 It's useful to gain additional background to the python init under Conda.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 13, 2023

If you run patchelf yourself on the built metta repl binary, can you confirm this allows it to find the right Python shared library without setting the LD_LIBRARY_PATH variable?

Yes, sure, patchelf --replace-needed libpython3.7m.so.1.0 $HOME/miniconda3/envs/<env-name>/lib/libpython3.7m.so.1.0 metta helps.

2.) We'd need to load the MeTTa libraries appropriate for the language we'll be interoperating with.

I am not sure what is an issue with that. We load MeTTa libraries on the start anyway. Looks like we can choose one set or another depending on whether --python is passed via the command line.

1.) the python shared lib dependency is a "Strong" link, so the OS won't even run our main unless the link can be resolved.

Yes, it is more serious. If PyO3 cannot load libpython....so on the fly (and from the error message in the issue description it looks like it doesn't do this) then it probably required to add this feature into PyO3.

it might be possible by hijacking the extension-module feature) The ugly hack-around is to have our own loader binary that springboards into another executable.

Interesting.

A related but different issue is here: PyO3/pyo3#1554 It's useful to gain additional background to the python init under Conda.

Looks like it was came up with conclusion it is a Windows Conda's version issue. Anyway not only Conda is under the trouble, we should also test Python virtualenv (another popular Python env isolation framework).

@luketpeterson
Copy link
Contributor

Just to update the ticket with my current thinking / findings:

1.) pyo3's build.rs script successfully locates the Conda python and libpython. (I think @vsbogd already mentioned this, but I've confirmed it using the PYO3_PRINT_CONFIG environment variable) This is presumably why the build is successful in spite of the emitted binary not being able to find all its dependencies.

2.) It appears that the python interpreter binary statically links libpython, when running under Conda. So by extension, I would guess this is what the Conda designers intended for other apps that need access to libpython.

3.) So I tried switching to static linking of libpython, but I hit an issue with LTO (link-time-optimization) because of compiler mismatch. I spent way more time than I'd like to admit trying to get rustc not to attempt to LTO libpython, and I eventually found this comment in the pyo3 source.
https://github.com/PyO3/pyo3/blob/8f4a26a66ecee4cfa473ada8cb27a57c4533d04f/pyo3-build-config/src/impl_.rs#L202

4.) So static linking of libpython is going to be fraught with problems. (They practically beg you not to do it here: https://pyo3.rs/main/building_and_distribution#statically-embedding-the-python-interpreter) And Conda doesn't like its libraries to be in a system-wide search path for dynamic linking. (Which seems silly to me because that seems like one of the basic properties of having an environment activated... But here we are.)

5.) As @vsbogd suggested, we could detect conda / pyenv in a build script somehow, and then update the binary to link libpython by absolute path. That would fix the issue, but would break if the user switched conda environments. (Which if I understand, is a common thing for Conda users to do)

6.) Apparently the idea of springboarding isn't so crazy in this case. I found this line in the pyo3 docs (https://pyo3.rs/main/building_and_distribution#dynamically-embedding-the-python-interpreter)

For distributing your program to non-technical users, you will have to consider including the Python shared library in your distribution as well as setting up wrapper scripts to set the right environment variables (such as LD_LIBRARY_PATH on UNIX, or PATH on Windows).

So in conclusion:
A.) We know pyo3's config logic succeeds in locating conda's python in all cases.
B.) But we don't know how to get an interpreter binary that reliably links Python in every case.

Therefore, it seems to me that the right solution is to write a wrapper / springboard that uses the python location logic from pyo3 to find the correct libpython for the current environment, and the execs the real binary.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 15, 2023

5.) As @vsbogd suggested, we could detect conda / pyenv in a build script somehow, and then update the binary to link libpython by absolute path. That would fix the issue, but would break if the user switched conda environments. (Which if I understand, is a common thing for Conda users to do)

Well I would not bother about it. In fact Conda users should be aware that each environment is separated and if you built some app under environment A it is not expected to work under environment B.

Therefore, it seems to me that the right solution is to write a wrapper / springboard that uses the python location logic from pyo3 to find the correct libpython for the current environment, and the execs the real binary.

I would not complicate things here to that degree. First just be aware that updating LD_LIBRARY_PATH is not a viable solution for a Conda env, because it mixes Conda's libraries with system libraries as a result system apps stop working.

Second, when we are distributing binaries it is natural to expect the library is installed in the system and can be loaded by OS loader. If we want to distribute binaries for the Conda users we need to prepare and publish Conda package (they have separate package management system) which is separate issue. Anyway I would not distribute Rust app through PyPi/Conda packages. I would use OS specific or universal Unix packages (like Snap or similar).

Third if user is brave enough to build an application I would simplify the task to him by ensuring the app can be used in environment it was built for out of the box but not further. For example if it is built under specific Conda or virtualenv it should be usable from this env but not necessary with a system Python. If it is built for a system Python it should be usable with system Python even after library update. But if user switches the environments it is his responsibility to understand what he does.

I am writing this I am trying to encourage you to do a simpler solution and save the time because experimenting with a big number of complex setups is time consuming. I would vote for making a trampoline layer for the loading libpython only in case it is a quickest possible solution.

@luketpeterson
Copy link
Contributor

First just be aware that updating LD_LIBRARY_PATH is not a viable solution for a Conda env, because it mixes Conda's libraries with system libraries as a result system apps stop working.

Do you mean we should not set LD_LIBRARY_PATH globally? ie. in the user's shared configs. If so, then I completely agree.

But I think setting LD_LIBRARY_PATH locally, ie. for the environment of the app we're going to jump into, is the recommended solution.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Sep 15, 2023

But I think setting LD_LIBRARY_PATH locally, ie. for the environment of the app we're going to jump into, is the recommended solution.

Yes, sure, locally it should work as far as I see.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Jul 25, 2024

According to #748 PyEnv is also affected.

@luketpeterson
Copy link
Contributor

Yes. It's fundamentally a Python issue, not specific to PyEnv or Conda. It's not even really Pyo3's fault as it's doing what any other application is supposed to do to load a library.

The fact that Python's compatibility solution requires multiple entire separate virtual environments is the source of the issue.

If we want to fix this we have to "enter through the door" provided by the environment manager. ie. probably by locating the python executable by scanning directories in the $PATH environment variable. (or letting the shell do that)

@luketpeterson
Copy link
Contributor

luketpeterson commented Jul 25, 2024

Re-reading the issue: I think we had it right last year: #432 (comment)

Therefore, it seems to me that the right solution is to write a wrapper / springboard that uses the python location logic from pyo3 to find the correct libpython for the current environment, and the execs the real binary.

This should be robust against all python environments and environment switching after build, etc.

I wonder if somebody else has already made a nice wrapper to handle this... I didn't see one when I looked last year, but maybe something exists now.

UPDATE: Doesn't look like it... My thread PyO3/pyo3#3454 is still the top Google hit for me although I may not be searching with the right terms.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Jul 25, 2024

First thing which comes into my mind is to split REPL on Python extension library and springboard executable and run python -m extension from springboard after looking for the correct Python executable.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Jul 25, 2024

Adding a separate shell script which find correct Python version and runs the binary has many problems. It means user should be aware that cargo run cannot be used. Shell script can be different for each supported platform. Shell script should know where REPL binaries are. I would say Ideally it should be done from the metta-repl executable itself.

@luketpeterson
Copy link
Contributor

luketpeterson commented Jul 25, 2024

Adding a separate shell script which find correct Python version and runs the binary has many problems. It means user should be aware that cargo run cannot be used. Shell script can be different for each supported platform. Shell script should know where REPL binaries are. I would say Ideally it should be done from the metta-repl executable itself.

Either would solve the problem, and you are right that a single binary avoids many of the issues of a trampoline script. A tool re-execing itself is a tried and true unix pattern, although it's always struck me as a little hackish. But I guess we are in the land of hacks*.

If you think it's a high priority, I may be able to try to make something to do this soon (maybe this weekend).

*I mean Conda and PyEnv are both essentially crazy hacks to get around the idea that Python is not designed to be self-contained, but real-world uses require multiple versions of Python to be installed.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Jul 25, 2024

If you think it's a high priority, I may be able to try to make something to do this soon (maybe this weekend).

Thanks Luke, yes, I think we need to fix it after all. Looks like we found all information we need here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working repl
Projects
None yet
Development

No branches or pull requests

2 participants