Refactor venv export #17098

benjyw · 2022-10-04T02:46:05Z

Unifies the resolve and tool exports, resolving a TODO to that effect.

A future change may re-introduce the venv symlink trick behind an option, for when faster export is required.

Saves a lot of superfluous zipping and unzipping, so should speed export up quite a bit. However note that copying a venv to dist/ takes non-trivial time for larger venvs, so the symlink trick should be re-introduced in a followup change. This change also refactors the export code to unify the resolve and tool exports, resolving a TODO to that effect.

jsirois · 2022-10-04T03:49:41Z

so should speed export up quite a bit

Do you have any anecdotes? Although this definitely should be faster, it'd be pretty embarrasing if not and we overlooked some subtlety. A comparison of cold and warm export times against even pants itself would be re-assuring.

benjyw · 2022-10-04T06:13:04Z

Do you have any anecdotes? Although this definitely should be faster, it'd be pretty embarrasing if not and we overlooked some subtlety. A comparison of cold and warm export times against even pants itself would be re-assuring.

It is in fact, unexpectedly, a little slower on pants's own deps.

The invocation of python -m pex.tools venv, which is all the work in the warm case, is ~500ms slower with the VenvPex... very consistently so.

I do not understand this.

benjyw · 2022-10-04T06:15:45Z

I wonder if this is due to materializing many small files from the original requirements VenvPex. That is probably unnecessary since I suppose we can execute pex.tools venv directly against the cached VenvPex? Will try that.

benjyw · 2022-10-04T07:19:15Z

Yep, that is 1s faster than current main (3.5s instead of 4.5s to run pex.tools venv), so 1.5s faster than the current iteration of this PR. Will put up another commit with this improvement.

Also, optionally allowing the symlink trick will now be a lot easier, since we're already grabbing the full path to the venvpex.

benjyw · 2022-10-04T07:47:36Z

Oh, hmm, it doesn't actually work properly, so that may be why it's faster...

benjyw · 2022-10-04T07:55:14Z

@jsirois looks like python -m pex.tools ~/.cache/pants/named_caches/pex_root/venvs/XX/YY venv --pip --collisions-ok /tmp/venv_pex creates a venv in /tmp/venv_pex that does not have any of XX/YY's site-packages in its own site-packages.

Is that expected?

benjyw · 2022-10-04T07:58:55Z

And note that this is the case when I explicitly use ~/.cache/pants/named_caches/pex_root/venvs/XX/YY/bin/python as the interpreter

benjyw · 2022-10-04T14:59:36Z

On investigation, this because pex_root/venvs/XX/YY has no .deps, which is where pex.tools venv looks for dists in _populate_deps(). That XX/YY dir does have a PEX-INFO, so pex tools venv can create a PEX object from it and use that to enumerate the deps, it just assumes the deps live in .deps, which they don't for a venv pex.

Do I have that about right?

jsirois · 2022-10-04T15:15:43Z

You do, but more generally you can only create a venv from a PEX - be it loose, packed or in zipapp form. You cannot create another venv from an existing venv using PEX_TOOLS. Your attempted use case should be failing fast ideally.

The point of using VenvPex internally would just be to be:

Leveraging all the pre-lmdb'd wheel zips of the packed layout these use.
Leveraging all the pre-installing of the wheel zips in the PEX_ROOT Pants uses.

It's 1 + 2 that should allow for the minimal amount of work creating a venv using PEX_TOOLS and the fully or partially pre-existing components of a packed PEX.

Ideally, you could just do this as a single step that turns a lockfile (or portions thereof) directly into a venv: pex-tool/pex#1752. You're effectively trying to do line 2 of the example here: pex-tool/pex#1752 (comment)

That line is complicated! and 1st creates a full PEX then builds a venv from it, but all in 1 line.

benjyw · 2022-10-04T23:35:21Z

Yeah, hmm. Another reason to use a VenvPex with all the requirements is that it's somewhat likely that one already exists in the cache. But it sounds like there is no alternative to creating a throwaway real PEX?

jsirois · 2022-10-04T23:57:44Z

Yeah, hmm. Another reason to use a VenvPex with all the requirements is that it's somewhat likely that one already exists in the cache.

I think that was only likely in repository.pex days or for tool PEXes. Unless there is a some target that depends on the entirety of a resolve, you will not get a cache hit for the user resolves.

But it sounds like there is no alternative to creating a throwaway real PEX?

Correct. The mitigations are items 1 & 2 I listed above - both packed wheel zips (and the bootstrap zip) and PEX_ROOT wheel installs are all re-used and there is no overall compression, its just a packed directory PEX. The final mitigation comes once pex-tool/pex#1752 is implemented.

benjyw · 2022-10-05T03:39:13Z

OK, this is now back to using a regular Pex, so the purpose of the change is now just the refactoring.

benjyw · 2022-10-05T03:40:26Z

src/python/pants/backend/python/goals/export.py

+            description="Get interpreter path and version",
+            argv=[
+                "-c",
+                "import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))",


This seemed like a more robust way to get a kosher interpreter then the previous line interpreter = cast(PythonExecutable, requirements_pex.python) and its clarifying comment below.

I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.

The internal-onlyness was critical in the previous method of using requirements_pex.python. Why is it critical here? This must give us some compatible interpreter in any case, surely?

I'm trying to point out you replaced a comment and re-use of an already computed and correct interpreter with another comment elsewhere about internal onlyness and a re-compute.

As far as criticality goes, it's critical always and until creating and storing a fully zipped PEX is fast and efficient. It never has been either. Zipping is always slow, and packed is needed for remote caching to work ~at all / not be storing both a wheel zip and a copy of that zipped up in 18 different non-packed PEX zips in LMDB.

Ah, fair enough. In this case I interpreted "critical" to mean "for correctness" - i.e., requirements_pex.python is only guaranteed to be populated because of internal-onlyness. This code will blow up otherwise. Getting the path to an interpreter by running the pex instead, as this now does, sidesteps the need for this assumption and won't blow up if the Pex is not internal-only.

You're referring to internal-onlyness being critical for performance, which it definitely is.

But since performance is important, it may be better to blow up, as it indicates a code bug. I'll change this.

benjyw · 2022-10-05T03:42:13Z

src/python/pants/backend/python/goals/export.py

+    )
+    interpreter_path, py_version = res.stdout.strip().decode().split("\n")
+
+    # NOTE: We add a unique prefix to the pex_pex path to avoid conflicts when multiple


Should probably materialize the Pex PEX once, to some location under .pants.d say, rather than once, in a temporary location, per export? But that can wait.

jsirois · 2022-10-05T03:49:16Z

A regular PEX incurs zipping overhead which can be large for large PEXes. It also means that large PEX blob gets LMDB'd. Did you try a layout="packed" regular PEX? That definitely should be faster to build than a regular PEX.

benjyw · 2022-10-05T04:00:41Z

This is always an internal_only Pex, so it should always be packed.

benjyw · 2022-10-05T04:04:40Z

I'll add a comment to that effect.

jsirois · 2022-10-05T04:05:06Z

Ok, that's pretty indirect, but verified.

jsirois · 2022-10-05T04:06:41Z

src/python/pants/backend/python/goals/export.py

+            description="Get interpreter path and version",
+            argv=[
+                "-c",
+                "import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))",


I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.

benjyw · 2022-10-05T16:20:57Z

OK, restored to previous logic, with comment.

benjyw added the category:internal CI, fixes for not-yet-released features, etc. label Oct 4, 2022

benjyw requested review from danxmoran and jsirois October 4, 2022 02:46

benjyw added 2 commits October 4, 2022 20:03

Use a regular packed Pex instead.

86da7cb

Merge branch 'main' into refactor_venv_export

a91b4b3

benjyw changed the title ~~Use a VenvPex when exporting python venvs.~~ Refactor venv export Oct 5, 2022

benjyw commented Oct 5, 2022

View reviewed changes

linr

8cb64f4

comment

c021f39

jsirois approved these changes Oct 5, 2022

View reviewed changes

use already-computed interpreter

bb286f3

jsirois approved these changes Oct 5, 2022

View reviewed changes

benjyw merged commit fc26779 into pantsbuild:main Oct 5, 2022

benjyw deleted the refactor_venv_export branch October 5, 2022 18:52

jsirois mentioned this pull request Oct 7, 2022

Prepare 2.15.0.dev4 release notes. #17161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor venv export #17098

Refactor venv export #17098

benjyw commented Oct 4, 2022 •

edited

Loading

jsirois commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022 •

edited

Loading

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

jsirois commented Oct 4, 2022

benjyw commented Oct 4, 2022

jsirois commented Oct 4, 2022

benjyw commented Oct 5, 2022

benjyw Oct 5, 2022

jsirois Oct 5, 2022

benjyw Oct 5, 2022

jsirois Oct 5, 2022

benjyw Oct 5, 2022

benjyw Oct 5, 2022

jsirois commented Oct 5, 2022

benjyw commented Oct 5, 2022

benjyw commented Oct 5, 2022

jsirois commented Oct 5, 2022

jsirois Oct 5, 2022

benjyw commented Oct 5, 2022

Refactor venv export #17098

Refactor venv export #17098

Conversation

benjyw commented Oct 4, 2022 • edited Loading

jsirois commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022 • edited Loading

benjyw commented Oct 4, 2022

benjyw commented Oct 4, 2022

jsirois commented Oct 4, 2022

benjyw commented Oct 4, 2022

jsirois commented Oct 4, 2022

benjyw commented Oct 5, 2022

benjyw Oct 5, 2022

Choose a reason for hiding this comment

jsirois Oct 5, 2022

Choose a reason for hiding this comment

benjyw Oct 5, 2022

Choose a reason for hiding this comment

jsirois Oct 5, 2022

Choose a reason for hiding this comment

benjyw Oct 5, 2022

Choose a reason for hiding this comment

benjyw Oct 5, 2022

Choose a reason for hiding this comment

jsirois commented Oct 5, 2022

benjyw commented Oct 5, 2022

benjyw commented Oct 5, 2022

jsirois commented Oct 5, 2022

jsirois Oct 5, 2022

Choose a reason for hiding this comment

benjyw commented Oct 5, 2022

benjyw commented Oct 4, 2022 •

edited

Loading

benjyw commented Oct 4, 2022 •

edited

Loading