-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor venv export #17098
Refactor venv export #17098
Conversation
Saves a lot of superfluous zipping and unzipping, so should speed export up quite a bit. However note that copying a venv to dist/ takes non-trivial time for larger venvs, so the symlink trick should be re-introduced in a followup change. This change also refactors the export code to unify the resolve and tool exports, resolving a TODO to that effect.
Do you have any anecdotes? Although this definitely should be faster, it'd be pretty embarrasing if not and we overlooked some subtlety. A comparison of cold and warm export times against even pants itself would be re-assuring. |
It is in fact, unexpectedly, a little slower on pants's own deps. The invocation of I do not understand this. |
I wonder if this is due to materializing many small files from the original requirements VenvPex. That is probably unnecessary since I suppose we can execute |
Yep, that is 1s faster than current main (3.5s instead of 4.5s to run Also, optionally allowing the symlink trick will now be a lot easier, since we're already grabbing the full path to the venvpex. |
Oh, hmm, it doesn't actually work properly, so that may be why it's faster... |
@jsirois looks like Is that expected? |
And note that this is the case when I explicitly use |
On investigation, this because pex_root/venvs/XX/YY has no Do I have that about right? |
You do, but more generally you can only create a venv from a PEX - be it loose, packed or in zipapp form. You cannot create another venv from an existing venv using PEX_TOOLS. Your attempted use case should be failing fast ideally. The point of using VenvPex internally would just be to be:
It's 1 + 2 that should allow for the minimal amount of work creating a venv using PEX_TOOLS and the fully or partially pre-existing components of a packed PEX. Ideally, you could just do this as a single step that turns a lockfile (or portions thereof) directly into a venv: pex-tool/pex#1752. You're effectively trying to do line 2 of the example here: pex-tool/pex#1752 (comment) That line is complicated! and 1st creates a full PEX then builds a venv from it, but all in 1 line. |
Yeah, hmm. Another reason to use a VenvPex with all the requirements is that it's somewhat likely that one already exists in the cache. But it sounds like there is no alternative to creating a throwaway real PEX? |
I think that was only likely in repository.pex days or for tool PEXes. Unless there is a some target that depends on the entirety of a resolve, you will not get a cache hit for the user resolves.
Correct. The mitigations are items 1 & 2 I listed above - both packed wheel zips (and the bootstrap zip) and PEX_ROOT wheel installs are all re-used and there is no overall compression, its just a packed directory PEX. The final mitigation comes once pex-tool/pex#1752 is implemented. |
OK, this is now back to using a regular Pex, so the purpose of the change is now just the refactoring. |
description="Get interpreter path and version", | ||
argv=[ | ||
"-c", | ||
"import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seemed like a more robust way to get a kosher interpreter then the previous line interpreter = cast(PythonExecutable, requirements_pex.python)
and its clarifying comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal-onlyness was critical in the previous method of using requirements_pex.python
. Why is it critical here? This must give us some compatible interpreter in any case, surely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to point out you replaced a comment and re-use of an already computed and correct interpreter with another comment elsewhere about internal onlyness and a re-compute.
As far as criticality goes, it's critical always and until creating and storing a fully zipped PEX is fast and efficient. It never has been either. Zipping is always slow, and packed is needed for remote caching to work ~at all / not be storing both a wheel zip and a copy of that zipped up in 18 different non-packed PEX zips in LMDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough. In this case I interpreted "critical" to mean "for correctness" - i.e., requirements_pex.python
is only guaranteed to be populated because of internal-onlyness. This code will blow up otherwise. Getting the path to an interpreter by running the pex instead, as this now does, sidesteps the need for this assumption and won't blow up if the Pex is not internal-only.
You're referring to internal-onlyness being critical for performance, which it definitely is.
But since performance is important, it may be better to blow up, as it indicates a code bug. I'll change this.
) | ||
interpreter_path, py_version = res.stdout.strip().decode().split("\n") | ||
|
||
# NOTE: We add a unique prefix to the pex_pex path to avoid conflicts when multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably materialize the Pex PEX once, to some location under .pants.d say, rather than once, in a temporary location, per export? But that can wait.
A regular PEX incurs zipping overhead which can be large for large PEXes. It also means that large PEX blob gets LMDB'd. Did you try a layout="packed" regular PEX? That definitely should be faster to build than a regular PEX. |
This is always an internal_only Pex, so it should always be packed. |
I'll add a comment to that effect. |
Ok, that's pretty indirect, but verified. |
description="Get interpreter path and version", | ||
argv=[ | ||
"-c", | ||
"import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.
OK, restored to previous logic, with comment. |
Unifies the resolve and tool exports, resolving a TODO to that effect.
A future change may re-introduce the venv symlink trick behind an option, for when faster export is required.