Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom environments in subinterpreters #126977

Open
FFY00 opened this issue Nov 18, 2024 · 4 comments
Open

Custom environments in subinterpreters #126977

FFY00 opened this issue Nov 18, 2024 · 4 comments
Labels
topic-subinterpreters type-feature A feature request or enhancement

Comments

@FFY00
Copy link
Member

FFY00 commented Nov 18, 2024

Feature or enhancement

Proposal:

I wanted to explore the viability of having custom environments in subinterpreters. There are several use-cases that could be enabled by this feature.

So far, from informal discussion with others about this, there are a couple possible issues to take into consideration.

Issues

  1. Some of the immortal objects shared between subinterpreter may be environment-dependent (pointed out by @Yhg1s)
  2. Complications around dynamic loading, by having extension modules from different environments
    2.1) Symbol conflicts from their dependencies (pointed out by @Yhg1s)
    2.2) Since subinterpreters share the same process, when loading the same shared object, they get the same pointer (pointed out by @pablogsal)

Implementation

The main thing we need is a way to disable the site initialization, which could be a enable_site option in the interpreter config. This should disable the environment customizations, and result in a bare environment without anything extra sys.path.

However, to make the use of different environments more ergonomic, we could add an environment_path location pointing to a directory containing a pyvenv.cfg, which would perform the site initialization for that environment.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

@FFY00
Copy link
Member Author

FFY00 commented Nov 18, 2024

I'd say this is probably the main issue with this proposal, but I don't know the specifics.

This is already an issue right now, but is exacerbated here, as it increases the likeliness of users running into it. We should consider possible preventive or mitigation measures.

Regarding 2.1), most modern Linux environments don't hit 2.1), as loaded symbols aren't loaded globally, unless RTLD_GLOBAL is used, but it is still an issue on a bunch of other systems, so it's still pretty relevant. (thanks @pablogsal)
A possible mitigation measure might be to preemptively detect symbol clashes and raise an ImportError when loading extensions that would hit it, but I am not sure about it's viability.

Regarding 2.2), AFAICT, this means that global data in the extension and its dependencies is shared between subinterpreters. Similarly, we could possibly mitigate this by detecting it and raising ImportError.

If these, or any other aspects of 2), are still problematic, we could simply prevent loading extension modules on subinterpreters that have a custom environment.

@gpshead
Copy link
Member

gpshead commented Nov 18, 2024

First reaction: I'm skeptical that we actually want this as stated? subinterpreters having different environment configs than the main interpreter doesn't feel right. Would we want to support that explicitly as a feature for everyone to build on and depend on?

a way to disable the site initialization, which could be a enable_site option in the interpreter config. This should disable the environment customizations, and result in a bare environment without anything extra sys.path.

This is a much more direct thing to ask for and could be implemented as a feature on its own without allowing arbitrary whole new environment configs. Gut feeling: whole new configs contain a can of worms of potentially unintended consequences. I expect Eric and others with their head in (sub)interpreter startup land to have a better feel for the reality of my gut check here.

@pablogsal
Copy link
Member

A possible mitigation measure might be to preemptively detect symbol clashes and raise an ImportError when loading extensions that would hit it, but I am not sure about it's viability.

This can still happen with some subset of symbols. For example GNU's extension 'unique global' symbols will still end in the global namespace even if you open with RTLD_LOCAL. The problem is that the poisoning can also happen later: imagine an extension that statically compiles libstdc++ and then something else loads an extension that depends on the shared object for libstdc++. Then some symbols (unique global ones) will be shared between the second extension and the first, leading to crashes if they are incompatible version of libstdc++.

All of this is to say that it will be quite difficult to detect the poisoning unless we require that the shared object exports no symbols other than the Py_init... one.

@pablogsal
Copy link
Member

Regarding 2.2), AFAICT, this means that global data in the extension and its dependencies is shared between subinterpreters. Similarly, we could possibly mitigate this by detecting it and raising ImportError.

This is not possible to detect but a module supporting two phase initialisation should be safe to load because it should have its global state on the module state and two sub interpreters will share it.

The problem will be something like a logging singleton in some internal dependency: initialising the logging singleton will initialise in all sub interpreters at the same time but you cannot just hard fail because that's how is supposed to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-subinterpreters type-feature A feature request or enhancement
Projects
Status: Todo
Development

No branches or pull requests

3 participants