Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability for Pants to provide Python via a union (with a pyenv impl) #18352

Merged
merged 30 commits into from
Mar 5, 2023

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Feb 24, 2023

Fixes #10447

Adding the ability for an optional union (PythonProvider) implementer to return a PythonExecutable (newly outfitted with append_only_caches) so that Python for user code can be supplied completely by rule-code. Additionally added a pyenv union implementation.

The dirty details are:

  • Python is essentially NOT relocatable (hats off to IndyGreg). Python itself, as well as ecosystem tools, are baked with assumptions about absolute paths.
  • Python installation is not atomic. That means we do our work behind a lock. We run the bash script on every session, but it should be fast as it only does the work if the target python doesnt exist.

@@ -80,7 +80,6 @@ async def resolve_plugins(
internal_only=True,
python=python,
requirements=requirements,
interpreter_constraints=request.interpreter_constraints,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was None and thats a no-no. However there's no type hint for this param so mypy isn't complaining.

@thejcannon
Copy link
Member Author

We still require some kind of python for executing non-user code. Namely:

  • pex to ask about ICs
  • gunzipping, and a few other places

I can tackle that in another PR.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is huge. Amazing stuff!

@kaos
Copy link
Member

kaos commented Feb 24, 2023

That means we do our work in a tempdir and symlink the result, on every session.

how much needs to be redone though? would be nice if we could avoid reinstalling Python on every restart of pantsd (that's what session means, right? or is it every pants run? cause that sounds like it would be a little bit of a buzzkill)

Co-authored-by: Andreas Stenius <git@astekk.se>
@thejcannon
Copy link
Member Author

Oh sorry bad phrasing. We run the process on every run, but bail if the python already exists. So we should only install it once. I'll fix the phrasing.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@thejcannon
Copy link
Member Author

Another open question: Should we start using this in this repo? There's two places it could be useful:

  • For running the repo itself ./pants run ...
  • For our tests that are multi-interpreter. Right now we skip, but with enabling the pyenv plugin in the test, we wouldn't have to (although there'd be timeout issues 😬 )

@benjyw
Copy link
Contributor

benjyw commented Feb 24, 2023

Awesome! Users will love it. Reviewing now.

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially massive!!

src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
input_digest=pyenv.digest,
description=f"Choose specific version for Python {python_to_use}",
env={"PATH": env_vars.get("PATH", "")},
# Caching the result is OK, since if the user really needs a different patch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. And this will only matter when new Python versions are released, which is... not often.

src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
@thejcannon
Copy link
Member Author

Oh one alternative to the current bash script is a Python script which uses locks and subprocesses. If we fix the issue highlighted above regarding non-user-code usage of Python (ideally with pyoxy or Python Build Standalone) a Python script to install Python is possible.

@thejcannon
Copy link
Member Author

thejcannon commented Mar 2, 2023

Okie dokie folks, hot changes coming in:

  • Switched the locking and installation to Python to be cross-platform(-ish). The sentinel file-checking is still in bash because it's the hot-path and bash go vroom.
  • Added a bunch of plumbing to make it so users can out-of-band install versions the same way we do in-band
    • We now have a new (internal) target type which is runnable representing running pyenv install.
    • And a synthetic target is constructed at the root named pyenv-install
    • This should be used for people to do more complicated things, like installing an optimized version of Python
  • Added more help docs

All-in-all I'm pretty pleased with it. Although I wish we had a convention/namespace surrounding synthetic targets location/names.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🚀

class PyenvInstall(Target):
alias = "_pyenv_install"
help = "<internal target>"
core_fields = (PyenvInstallSentinelField,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit boiler-platey. not suggesting to do anything about it here and now, but if we find this pattern useful, it could make sense to streamline it a bit :)

src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks.

As to Docker: this should work fine, since named caches have been used properly AFAICT.

Comment on lines 45 to 48
A subsystem for pyenv-provided-Python (https://github.com/pyenv/pyenv).

Enabling this subsystem will switch Pants from trying to find an appropriate Python on your
system to using pyenv to install the correct Python(s).
Copy link
Member

@stuhood stuhood Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intro doesn't really make it clear that this is a Pants-managed pyenv, rather than a pyenv install on your system (which we do have support for via interpreter_search_path=["<PYENV>"]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Which also highlights that some doc changes might be good for this, so that those two strategies reference one another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yeah, docs always elude me. I'll see what I can do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna punt on docs changes in this PR and open another one for that.

Comment on lines +65 to +72
If you wish to customize the pyenv installation of a Python, a synthetic target is exposed
at the root of your repo which is runnable. This target can be run with the relevant
environment variables set to enable optimizations. You will need to wipe the specific
version directory if Python was already installed. Example:

sudo rm -rf <named_caches_dir>/pyenv/versions/<specific_version>
# env vars from https://github.com/pyenv/pyenv/blob/master/plugins/python-build/README.md#building-for-maximum-performance
PYTHON_CONFIGURE_OPTS='--enable-optimizations --with-lto' {bin_name()} run :pants-pyenv-install -- 3.10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... this is very magical. It feels like something that should be provided via subsystem options instead, otherwise the build could be very non-hermetic with folks using interpreters which were customized in different ways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expose this through options it's the same end result. Customized python builds...

And this way is the only "middle ground" I could come up with to make having some (uncacheable) build step take 30 minutes. Make the user "own" running that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be the end result (if people end up putting things in a .pantsrc file), but the happy path would be to put the option in pants.toml instead, which would cause the option to be applied by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well maybe I'm just viewing this from my perspective, but I know my engineers wouldn't wait for a 30 minute process. And they'd bug me constantly about it. And kill it. Meaning they'd have to restart.

So I can't put it in pants.toml.

And putting it there feels weird because if you change the option we don't invalidate existing pythons either. (Same for .pants.rc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, the end goal is providing Python. That's the 80% case.

For the 20%, let's give them a back door. But it doesn't have to be of the same elk as the front door 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't put it in pants.toml.

This is true for optimization flags, but not true for the other flags you might want to pass to a pyenv install: particularly CONFIGURE_OPTS.

And putting it there feels weird because if you change the option we don't invalidate existing pythons either. (Same for .pants.rc)

We could? Fingerprinting the options could be done by embedding them in the installation script, and taking its digest, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure anything is possible. Getting back to it though, the build is already non-hermetic and we still need a way to allow users to install Python out-of-band or suffer a 30-minute build step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think I've decided to have this functionality be in its own sub-plugin of the pyenv provider plugin. So it's "off" by default.

src/python/pants/backend/python/providers/pyenv/rules.py Outdated Show resolved Hide resolved
@thejcannon
Copy link
Member Author

Marked as experimental so we can merge with a few conversations ongoing.

@thejcannon thejcannon merged commit 7398971 into pantsbuild:main Mar 5, 2023
thejcannon added a commit that referenced this pull request Mar 13, 2023
This PR decouples the Python Pants uses for its own nefarious purposes
(like running PEX or `gunzip`) from the user search paths by either
using `sys.executable` locally or downloading and using Python Build
Standalone in a Docker environment.

Additionally when making this change, the Python/pex code was refactored
so that we always use this Python to run pex, with Python either being
chosen by pex, or by using `PEX_PYTHON` env var at runtime. I think this
is a nice cleanup of the handshake between
`CompletePexEnvironment.create_argv` and
`CompletePexEnvironment.environment_dict` used to have.

On a side note, this (with `scie-pants` and
#18352) this marks a complete
divorce from any Python on the user system (or lack thereof) and running
Python code.
thejcannon added a commit to thejcannon/pants that referenced this pull request Mar 14, 2023
This PR decouples the Python Pants uses for its own nefarious purposes
(like running PEX or `gunzip`) from the user search paths by either
using `sys.executable` locally or downloading and using Python Build
Standalone in a Docker environment.

Additionally when making this change, the Python/pex code was refactored
so that we always use this Python to run pex, with Python either being
chosen by pex, or by using `PEX_PYTHON` env var at runtime. I think this
is a nice cleanup of the handshake between
`CompletePexEnvironment.create_argv` and
`CompletePexEnvironment.environment_dict` used to have.

On a side note, this (with `scie-pants` and
pantsbuild#18352) this marks a complete
divorce from any Python on the user system (or lack thereof) and running
Python code.
thejcannon added a commit that referenced this pull request Mar 14, 2023
This PR decouples the Python Pants uses for its own nefarious purposes
(like running PEX or `gunzip`) from the user search paths by either
using `sys.executable` locally or downloading and using Python Build
Standalone in a Docker environment.

Additionally when making this change, the Python/pex code was refactored
so that we always use this Python to run pex, with Python either being
chosen by pex, or by using `PEX_PYTHON` env var at runtime. I think this
is a nice cleanup of the handshake between
`CompletePexEnvironment.create_argv` and
`CompletePexEnvironment.environment_dict` used to have.

On a side note, this (with `scie-pants` and
#18352) this marks a complete
divorce from any Python on the user system (or lack thereof) and running
Python code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support vendoring of the Python interpreter
5 participants