Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden PATH setting in pex runs #9760

Closed
stuhood opened this issue May 14, 2020 · 10 comments · Fixed by #10489
Closed

Harden PATH setting in pex runs #9760

stuhood opened this issue May 14, 2020 · 10 comments · Fixed by #10489
Assignees
Milestone

Comments

@stuhood
Copy link
Member

stuhood commented May 14, 2020

The interpreter_search_paths can be used to adjust where PEX will go hunting for pythons.

But when trying to adjust the setting, it is not currently possible to "remove" the PATH from interpreter_search_paths. The following:

$ cat ~/.pants.rc
[python-setup]
# Avoid ever using a python that is not provided by pyenv.
interpreter_search_paths = ["<PYENV>"]

... fails under v2 because this line:

PATH=create_path_env_var(python_setup.interpreter_search_paths),

...ends up relying on the presence of the default PATH entry from interpreter_search_paths... completely clearing the list results in not being able to find tools used by the shims (bash, sed, cut, etc), but also in not being able to find system headers.

The following does work:

$ cat ~/.pants.rc
[python-setup]
interpreter_constraints = ["CPython>=3.7.7"]

... but is repository specific: i.e., it's not a good idea to install something like this globally, as it might override per-repo defaults.


In short: we need to decouple setting the PATH in executions from how PEX chooses interpreters.

@stuhood stuhood added the python label May 14, 2020
@stuhood stuhood changed the title Allow for global exclusion of particular interpreters Harden PATH setting in pex runs Jun 18, 2020
@stuhood
Copy link
Member Author

stuhood commented Jun 18, 2020

This also manifests here:

pants/pants.remote.toml

Lines 35 to 50 in e328b58

[python-setup]
# TODO(#7735): This config is not ideal, that we must specify the PATH for both local and remote
# platforms. This should be replaced by a proper mechanism to differentiate between the two.
interpreter_search_paths = [
# These are the interpreter paths we set up on the remote container, plus `/usr/bin`, so that
# pip can find `ld` if necessary.
"/pyenv-docker-build/versions/3.7.3/bin:/pyenv-docker-build/versions/3.6.8/bin:/pyenv-docker-build/versions/2.7.15/bin:/usr/bin",
# We include the host PATH and PEXRC values so that speculation still works.
# NOTE: These come after the remote paths. Putting them before the remote paths means generic
# bin dirs like /usr/bin will be on the PATH ahead of the pyenv dirs we actually want to use
# on the remote side. The /pyenv-docker-build/ paths are unlikely to exist on local systems,
# and so will not interfere with interpreter discovery there. This emphasizes
# that we should fix #7735, and not commingle the paths of two unrelated systems.
'<PATH>',
'<PEXRC>',
]

... so this relates to #7735.

@etiennedupont
Copy link

etiennedupont commented Jul 5, 2020

Hi @stuhood
I am having a similar issue with pants v2 that I think mirrors what you are saying.

I have the global config below:

[GLOBAL]
pants_version = "1.29.0rc0"
v1 =  false  # Turn off the v1 execution engine.
v2 = true  # Enable the v2 execution engine.
dynamic_ui = true  # Enable the v2 execution engine's terminal-based UI.
enable_pantsd = false # will be enabled by default in 1.30
colors = true # Ensure colors are used.
print_exception_stacktrace = true # print all stacktrace of errors
backend_packages = []  # Deregister all v1 backends.
# List v2 backends here.
backend_packages2 = [
  'pants.backend.python',
  'pants.backend.python.lint.docformatter',
  'pants.backend.python.lint.black',
  'pants.backend.python.lint.isort',
  'pants.backend.python.lint.flake8',
]
# List v2 plugins here.
plugins2 = []

By one way or another, Python (v3.7.3) was installed on my Mac along with Xcode, with limited permissions (-rwxr-xr-x). The executable then lies in /usr/bin. I don’t wish to use this interpreter and rather want to use pyenv, on which I installed a version (v3.7.6) corresponding to the constraints of my project. I show below some failed attempts to make everything work:



First case
my config is (among other things).

[python-setup]
interpreter_constraints = ["CPython>=3.7"]
interpreter_search_paths = [ "<PYENV>”, “<PATH>”]

When I run some commands, I am receiving a “Permission Denied” and the logs show explicitly that this Xcode Python is being used instead of the Pyenv one.


Second case
my config is

[python-setup]
interpreter_constraints = ["CPython>=3.7"]
interpreter_search_paths = [ "<PYENV>”]

This forces to choose the pyenv interpreter, but as mentioned there, we run into another issue because some of the console tools are out of scope.

Third case

I tried to tune the interpreter_search_paths but I need to include /usr/bin as some tools that lie there are expected when running pants, and if I include it, the Xcode interpreter is inevitably being chosen over the pyenv one 🤯

I ended-up tightening the constraints to interpreter_constraints = ["CPython>=3.7.4"] which made the trick, but I don't think that it is a good option, as an update in Xcode Python version would make it fail again.

@stuhood
Copy link
Member Author

stuhood commented Jul 15, 2020

Thanks @etiennedupont : agreed that this is a problem. I think that we should prioritize this before 2.0.

How I think that we will tackle this will be to have Platform/Environment specific PATH option (via #7735) that will be injected independently from the interpreter constraints, so that clearing interpreter constraints does not cause unrelated path entries to go missing. We can define this option before #7735 is completed, and #7735 will backwards-compatibly allow for per-Platform/Environment configuration of the option to account for remote execution and otherwise varying execution platforms.

@benjyw benjyw added this to the 2.0.0.alpha0 milestone Jul 16, 2020
@stuhood
Copy link
Member Author

stuhood commented Jul 22, 2020

I'm about ready to begin implementation here. A sketch of what I'm thinking:

The ability to adjust the PATH (filtering) for a process will be a fairly universal thing, and a global default for processes is likely. But it will likely also be necessary to adjust this on a Process by Process basis (a simpler form of what was described in #10156).

So, will:

  1. Add a ProcessEnvironment subsystem with an option to override/filter the PATH.
  2. Add Process(.., environment: Optional[ProcessEnvironment]), and adjust the Process intrinsic to depend on the global ProcessEnvironment if an environment is not specified.
  3. Implement a syntax for declaring a scoped Subsystem dependency, so that different processes can choose to use different instances of the subsystem.
    • Subclassing? Fine, but doesn't allow for inheritance.
    class MyProcessEnvironment(ProcessEnvironment):
        name: str
    
    SubsystemRule(MyProcessEnvironment)
    
    • Subclassing with scopes? Seems like it would work... a little awkward, but not bad.
    class MyProcessEnvironment(Scoped, ProcessEnvironment):
        subscope: str
    
    SubsystemRule(MyProcessEnvironment)
    
    • Generics for scopes? Runs up against the fact that subscripting a type results in a non-type currently... would need engine support.
    class MyScope(Scope):
        name: str
    
    SubsystemRule(ProcessEnvironment[MyScope])
    
    • Memoization of an anonymous class? I think that this can actually be made to typecheck just fine...
    MyProcessEnvironment = ProcessEnvironment.scoped_to("my_scope")
    
    SubsystemRule(MyProcessEnvironment)
    

@stuhood
Copy link
Member Author

stuhood commented Jul 23, 2020

Ok, the current state of this is that I've mostly completed (1) and (2), and decided that the most likely path forward is for this to be a PATH filter. Implementing this as a filter will mean that the rust CommandRunner will do Something to detect the PATH in the relevant environment, and will then filter it to provide to all processes:

  • For the local CommandRunner, this will mean adding a complete PATH global_option (marked daemon=True to restart pantsd), with a default of the entire PATH that pants is running with. That variable will exist for fingerprinting purposes, and is the thing that will then be filtered down on a process by process basis by the PATH filter option.
  • For the remote CommandRunner, this will involve running a specially formulated process lazily at startup time to detect the PATH in the remote environment, and then filtering that for each user process that runs afterward.

Finally, point 3 is OK as a followup: if people can configure global PATH filters now, we can migrate toward having scoped configuration later (similar to #7735 later introducing environment-specific filters).

I have to switch to other things, but this remains important for the alpha, so I'll resume early next week I hope.

@jsirois
Copy link
Contributor

jsirois commented Jul 24, 2020

As an alternate idea late to the game:

Currently we do this as you pointed out:

PATH=create_path_env_var(python_setup.interpreter_search_paths),

That's unfortunate though since Pex natively supports specifying an alternate interpreter search path in its API (but not in its CLI):
https://github.com/pantsbuild/pex/blob/b51b340bea8a585a246997071a815d008986d6e3/pex/interpreter.py#L278-L288

I think it makes perfect sense to support specifying a custom interpreter search path at least for the PEX runtime side of things 1st class in PEX since, independent of Pants, any PEX might contain code that shells out and thus wants to separate its interpreter selection search at boot time from its ambient PATH at runtime.

@jsirois
Copy link
Contributor

jsirois commented Jul 24, 2020

Talked with Stu about the above offline, but the short is that we do need filtering of PATH per Process, this is still too coarse a knob for PEX runtime interpreter selection in any case where the PEX user code needs access to the unadulterated PATH to find its own tools to use. I've filed pex-tool/pex#1012 over in Pex to add this feature which will allow fixing the PEX runtime case completely leaving Stu's more general PATH filtering work to stand for general Process execution control.

@stuhood
Copy link
Member Author

stuhood commented Jul 24, 2020

One big thing I was missing is that

@classmethod
def expand_interpreter_search_paths(cls, interpreter_search_paths, pyenv_root_func=None):
special_strings = {
"<PEXRC>": cls.get_pex_python_paths,
"<PATH>": cls.get_environment_paths,
"<PYENV>": lambda: cls.get_pyenv_paths(pyenv_root_func=pyenv_root_func),
"<PYENV_LOCAL>": lambda: cls.get_pyenv_paths(
pyenv_root_func=pyenv_root_func, pyenv_local=True
),
}
expanded = []
from_pexrc = None
for s in interpreter_search_paths:
if s in special_strings:
special_paths = special_strings[s]()
if s == "<PEXRC>":
from_pexrc = special_paths
expanded.extend(special_paths)
else:
expanded.append(s)
# Some special-case logging to avoid misunderstandings.
if from_pexrc and len(expanded) > len(from_pexrc):
logger.info(
"pexrc interpreters requested and found, but other paths were also specified, "
"so interpreters may not be restricted to the pexrc ones. Full search path is: "
"{}".format(":".join(expanded))
)
return expanded
is implemented in Pants rather than PEX currently.

@stuhood stuhood assigned jsirois and unassigned stuhood Jul 27, 2020
@stuhood
Copy link
Member Author

stuhood commented Jul 27, 2020

I think that @jsirois 's current tack on this is sufficient to resolve the issue for alpha.

jsirois added a commit to jsirois/pants that referenced this issue Jul 28, 2020
Previously we used PATH to steer interpreter selection for hermetic PEX
bootstrap and runtime. This led to fragile or impossible to solve
setups.

Switch to using PEX_PYTHON_PATH to both discover a bootstrap
interpreter and steer runtime interpreter selection. Introduce a
find_binary rule to facilitate this that uses ~portable bash for now.
Also introduce a PexRuntimeEnvironment to steer bootstrap interpreter
probing and allow specification of the PATH that should be exposed to
the PEX runtime environment.

Fixes pantsbuild#9760

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Eric-Arellano pushed a commit that referenced this issue Jul 30, 2020
Previously, `--python-setup-interpreter-search-paths` would be used both to find Python interpreters and to set the $PATH for the running PEX subprocess. This resulted in many awkward/impossible set ups.

Switch to using PEX_PYTHON_PATH to both discover a bootstrap
interpreter and steer runtime interpreter selection. Introduce a
find_binary rule to facilitate this that uses ~portable bash for now.
Also introduce a PexRuntimeEnvironment to steer bootstrap interpreter
probing and allow specification of the PATH that should be exposed to
the PEX runtime environment.

Fixes #9760.

[ci skip-rust]
[ci skip-build-wheels]
@stuhood
Copy link
Member Author

stuhood commented Jul 31, 2020

Opened #10526 with some followup.

stuhood added a commit that referenced this issue Sep 25, 2020
…0858)

### Problem

The first error in #10855 is that when we search a search path which does not include `bash`, lookups fail silently because the discovery script fails to start. The usecase described on #9760 (setting `interpreter_search_path=["<PYENV>"]`) is one example where `bash` will not be present.

### Solution

Recurse to locate an absolute path for `bash` before using `bash`. Additionally, change the "discovery" portion of `BinaryPaths` from `FallibleProcessResult` to `ProcessResult` to fail more quickly in cases where discovery errors, as opposed to succeeding with an empty result.

### Result

The added test passes, and the usecase from #9760 works on my machine.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants