Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python import parser does not support subprocess_environment or some other way to leak env vars. #16565

Open
jsirois opened this issue Aug 17, 2022 · 6 comments
Labels
backend: Python Python backend-related issues bug

Comments

@jsirois
Copy link
Contributor

jsirois commented Aug 17, 2022

The code has never used a PexProcess or VenvPexProcess or manually plumbed SubprocessEnvironment or else its own bespoke env var leaking option:

process_result = await Get(
ProcessResult,
Process(
argv=[
python_interpreter.path,
"./__parse_python_dependencies.py",
file,
],
input_digest=input_digest,
description=f"Determine Python dependencies for {request.source.address}",
env={
"STRING_IMPORTS": "y" if request.string_imports else "n",
"STRING_IMPORTS_MIN_DOTS": str(request.string_imports_min_dots),
"ASSETS": "y" if request.assets else "n",
"ASSETS_MIN_SLASHES": str(request.assets_min_slashes),
},
level=LogLevel.DEBUG,
),
)

This results in situations where other Pants rules can use a given Python interpreter but the dependency inference rules cannot when those other rules have the benefit of --subprocess-environment-env-vars='+["LD_LIBRARY_PATH"]' - for example. This particular example comes up when trying to use GH actions provided Pythons where they are not statically linked Python and use a .so and that .so is not in the default linker cache / linker search path.

@jsirois jsirois added bug backend: Python Python backend-related issues labels Aug 17, 2022
@jsirois
Copy link
Contributor Author

jsirois commented Aug 18, 2022

The more interesting thing here @stuhood is that a data dependency (on a binary) has an implicit data dependency on the env vars (or at least 1 of them, LD_LIBRARY_PATH) exposed to the process that found the binary. Without that LD_LIBRARY_PATH satisfied, the downstream recipient of the found Python binary cannot use it. As things stand, humans carefully or un-knowingly arrange for the implicit dependency to be satisfied unlike most other things engine. Not sure if there is anything clever we can do here. If we engineered a mechanism to propagate env vars in a growing set down to dependent subprocesses, that would be too broad a brush in general although it would prevent this sort of issue in general. Perhaps we're stuck with being careful.

@stuhood
Copy link
Member

stuhood commented Aug 18, 2022

The more interesting thing here @stuhood is that a data dependency (on a binary) has an implicit data dependency on the env vars (or at least 1 of them, LD_LIBRARY_PATH) exposed to the process that found the binary. Without that LD_LIBRARY_PATH satisfied, the downstream recipient of the found Python binary cannot use it. As things stand, humans carefully or un-knowingly arrange for the implicit dependency to be satisfied unlike most other things engine. Not sure if there is anything clever we can do here.

Yea, this is tricky. It's definitely not implicit that if Process 1 has consumed something from an environment variable, that Process 2 downstream will also need to consume it.

This results in situations where other Pants rules can use a given Python interpreter but the dependency inference rules cannot when those other rules have the benefit of --subprocess-environment-env-vars='+["LD_LIBRARY_PATH"]' - for example.

The tack that I have been trying to encourage for the "options which declare environment variables which need to be leaked" pattern is to attempt to make them use-case or tool specific: "all subprocesses" is just too broad.

This ticket represents a really good example of that I think: in this case, the tool that requires a leaked environment variable is the Python interpreter. So as a strawimpl, you could imagine having --python-interpreter-env-vars, which would be used in any situation where we expected to be able to use (a particular...?) Python interpreter.

And this relates to #12203 as well. Similar to a PATH entry, the filesystem content of the value behind the env-var value is relevant. So the LD_LIBRARY_PATH variable would either need to be:

  1. deeply fingerprinted in an oblivous way
  2. explicitly supported in some way (i.e.: we install logic which fingerprints the LD_LIBRARY_PATH in a "particular library version-aware way"... essentially a declaration that no other data behind that env var is relevant)

@huonw
Copy link
Contributor

huonw commented May 11, 2023

The original problem (with python import parsing) might be obsoleted by the new Rust-based parser #18854, once that fully replaces the Python-based one. Once that's the case, the docs in #18900 referencing this issue could probably be removed too.

That does nothing for the broader problem of propagating env var dependencies, of course.

@kaos
Copy link
Member

kaos commented May 11, 2023

Would it perhaps make sense to also capture which env vars to leak in the BinaryPath result?

class BinaryPath:
path: str
fingerprint: str

If the request indicates which env vars are potentially relevant, those could be leaked into the result if defined. Then a call site requesting a search for a Python binary could include LD_LIBRARY_PATH to be included in the resulting BinaryPath which may then be used in subsequent process requests for that binary. That way it's configurable per tool which env vars to plumb through for..

@thejcannon
Copy link
Member

I think, for this case, what the root problem is is that plugin authors can ask for a Python executable, and in most (every?) case they shouldn't be using it.
With that in mind maybe we make it harder/impossible to do so?

@thejcannon
Copy link
Member

FYI I just fixed this for Django dep inference as well: #19008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues bug
Projects
None yet
Development

No branches or pull requests

5 participants