Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filesystem sandbox leaks in Python bootstrapping #16800

Closed
stuhood opened this issue Sep 8, 2022 · 7 comments · Fixed by #17051
Closed

Fix filesystem sandbox leaks in Python bootstrapping #16800

stuhood opened this issue Sep 8, 2022 · 7 comments · Fixed by #17051
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented Sep 8, 2022

(blocked by #16754)

python_bootstrap.py and asdf.py are the two (known?) filesystem sandbox leaks due to using direct filesystem access (PATH checks, essentially) to discover Python interpreters:

# TODO: This rule is marked uncacheable because it directly accsses the filesystem to examine ASDF configuration.
# See https://github.com/pantsbuild/pants/issues/10842 for potential future support for capturing from absolute
# paths that could allow this rule to be cached.


To resolve this, we might need to port the discovery of those paths to a Process (which could run inside of an image). And unfortunately, because they execute before we are sure that we have a python-interpreter available, possibly to either bash or native code (see some discussion here: #10769 (comment)).

Alternatively, we might be able to implement and use #10842: the implication would be that unlike Snapshots/PathGlobs in general, an "absolute path" based implementation would be environment specific, and could be implemented such that it did lookups inside a docker container.

@stuhood stuhood changed the title Address filesystem sandbox leaks in Python bootstrapping Fix filesystem sandbox leaks in Python bootstrapping Sep 8, 2022
@stuhood
Copy link
Member Author

stuhood commented Sep 12, 2022

@tdyas raised the point that the latter approach (a docker-aware intrinsic), adds the requirement that there is a docker image which corresponds to any remote execution environment, and some environments might not use docker.

Additionally the Process approach is very general and composable: it would be easy to create an (uncacheable) Process which captured a Snapshot from an absolute path, for example.

@chrisjrn
Copy link
Contributor

I'm working on this. python_boostrap is a bit of a tangled web, and it's highly likely that I'll be attempting this one in multiple stages, as the current module layout is not really amenable to converting things into rules.

@chrisjrn
Copy link
Contributor

Having got to the point where I'm actually looking at the code in python_bootstrap that actually hits the filesystem, it looks like the scenarios that actually hit the filesystem directly are only directly related to pyenv and asdf, both of which are more relevant to managing runtimes on a local machine. Do we envisage a use case where we have pyenv installed in remote/docker environments, or is this something that we can mark off as only working in local environments?

@chrisjrn
Copy link
Contributor

(to be clear: I think pyenv and asdf should be local-only, and we should work on making the rules cacheable based on the contents of the relevant config files if possible.)

@Eric-Arellano
Copy link
Contributor

Related: @chrisjrn and I decided that <PEXRC> should not work with docker environments and remote enviroments: that would be very weird to have added ~/.pexrc inside either of those environments/containers...Instead, users can set up the docker_environment and remote_environment targets with the explicit paths.

@chrisjrn, I think it is reasonable to not support <ASDF> and <PYENV> when the environment is Docker or Remote. At least for now, until we get feedback from users that they need it.

But, we still would want code to detect when EnvironmentTarget is not local, and not attempt to search those special strings. It is wrong for us to evaluate them using localhost, and then to try using those results inside Docker and RE. We need to skip them, and possibly warn or error when they are set for those environments.

possibly warn or error when they are set for those environments.

I don't know the right behavior here. Note that we include <PYENV> by default for [python-bootstrap].search_path, so it would be really annoying to error or loudly warn that <PYENV> is being ignored with Docker and remote environments because you cannot use Pants's default then with those targets. Instead, we may want to do something like see if options.is_default("search_path") to determine whether we should warn/error.

@chrisjrn
Copy link
Contributor

@stuhood thoughts?

@stuhood
Copy link
Member Author

stuhood commented Sep 27, 2022

@chrisjrn, I think it is reasonable to not support <ASDF> and <PYENV> when the environment is Docker or Remote. At least for now, until we get feedback from users that they need it.

But, we still would want code to detect when EnvironmentTarget is not local, and not attempt to search those special strings. It is wrong for us to evaluate them using localhost, and then to try using those results inside Docker and RE. We need to skip them, and possibly warn or error when they are set for those environments.

Yea, fine with this.

chrisjrn pushed a commit that referenced this issue Sep 28, 2022
…hs (#17030)

Previously, `PythonBootstrap` had a method that would calculate the expanded interpreter paths when they are first consumed. This change moves the calculation of those expanded interpreter paths to the initial creation of `PythonBootstrap`.

Relatedly, a bunch of static methods defined on `PythonBootstrap` are now private functions at module scope. This is prework that will significantly improve the clarity of my next piece of work on `PythonBootstrap`, namely converting all of the various filesystem-dependent functions to `@rule`s.

**None of the helper methods have been changed**

Addresses #16800.
chrisjrn pushed a commit that referenced this issue Sep 28, 2022
…valid settings (#17041)

Per discussion in #16800, we're OK with using direct filesystem access to support pyenv support (and separately, asdf support) for local runtime environments only. This adds a preprocessing function that supports:

* no-opping for local/undefined environments
* handling the case where the search paths are unaltered from the default, by stripping out unsupported search paths
* raising an exception with fixing instructions if unsupported search paths are requested for remote/docker environments

n.b. `PythonBootstrap` is still incorrectly cached for pyenv users, and that will be addressed in one (hopefully) final PR.
chrisjrn pushed a commit that referenced this issue Sep 29, 2022
This moves the path-specific options for `GolangSubsystem` into an `EnvironmentAware` block, and factors `cgo_tool_search_paths` into a property based on the new `depends_on_env_vars` functionality.

Further, the `GoBootstrap` code is refactored to pre-compute the go search paths, saving a request to the uncacheable `asdf` rules if they are not necessary.

Still todo -- erroring if `asdf` is used in a non-local environment.

Unblocks work on #16800
chrisjrn pushed a commit that referenced this issue Sep 30, 2022
Following discussion with @stuhood, it emerged that none of the filesystem-sensitive code in `python_bootstrap` is cache-correct, in that the functions that hit the filesystem were not marked `uncacheable`.

This PR refactors `_expand_interpreter_search_paths` and `_get_pyenv_paths` into rules, marking `_get_pyenv_paths` as uncacheable. This reduces the footprint of the sandbox-unsafe code to just `_get_pyenv_paths`.

It also reduces the scope of `AsdfToolPathsRequest`, and moves the point where it is requested to inside the path expansion rule, and only if `asdf` is actually requested.

Closes #16800.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants