Make PEX lockfile subsetting incremental #14944

thejcannon · 2022-03-29T18:49:32Z

Background

This PR has 2 facets:

Moving most of the RequirementsPexRequest rule handling into PexFromTargetsRequest rule handling to help streamline where the changes are for this.
Passing the requirement strings on the command-line to PEX in the PEX lockfile case (along with --lock) so PEX doesn't need pip to resolve anything.

The result of this is that:

user's shouldn't have to trade-off [python].run_against_entire_lockfile or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around)
Users using --changed now will only request the dependencies needed for the operation, and will only download what isn't already cached.
because of 1. upgrades to Pants which add new arguments to the PEX process invocation, therefore invalidating caches, no longer invalidate the repository PEX.

Metrics

On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694)

For cold cache, I run the following commands to warm up

rm -rf ~/.cache/pants/lmdb_store/
rm -rf ~/.cache/pants/named_caches/
rm -rf .pids
./pants_from_sources help

Notes:

64-core machine
black, (internal copyright formatter), flake8, isort, ❗mypy , and pylint
I'm using [pex].venv_use_symlinks = true

Full lint

This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache.

time ./pants_from_sources lint ::

-	2.11.x	This PR
time - real	9m48.887s	5m10.263s
time - user	36m7.188s	53m21.293s
time - sys	4m29.392s	6m18.442s
space - lmdb_store	2.3G	1.4GB
space - named_caches	11G	9.5GB

On 2.11.X, installing repository.pex took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant)

Incremental (--changed) lint

This case models developer runs.

(Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature)
time ./pants_from_sources lint path/to/a_test_file.py
followed by time ./pants_from_sources lint different-path/to/a_different_file.py

First file:

-	2.11.x	This PR
time - real	7m47.071s	0m40.998s
time - user	8m50.894s	1m24.781s
time - sys	1m30.338s	0m13.899s
space - lmdb_store	2.3G	61M
space - named_caches	10G	383M

Second file:
(note this file has a transitive dependency on mxnet which is very large)

-	2.11.x	This PR
time - real	0m16.437s	2m39.155s
time - user	0m22.137s	3m33.879s
time - sys	0m9.119s	0m20.483s
space - lmdb_store	2.3G	865M
space - named_caches	10G	5.4G

[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

thejcannon · 2022-03-29T18:51:48Z

src/python/pants/backend/python/util_rules/pex_from_targets.py

+    requirements: PexRequirements | Lockfile = PexRequirements()
+    if request.include_requirements:
+        requirements = await Get(PexRequirements, _PexRequirementsRequest(request.addresses))
+
+        should_return_entire_lockfile = (
+            python_setup.run_against_entire_lockfile and request.internal_only
+        )
+        should_request_repository_pex = should_return_entire_lockfile or (
+            python_setup.resolve_all_constraints and python_setup.requirement_constraints
+        )
+        if should_request_repository_pex:
+            repository_pex_request = await Get(
+                OptionalPexRequest,
+                _RepositoryPexRequest(
+                    request.addresses,
+                    hardcoded_interpreter_constraints=request.hardcoded_interpreter_constraints,
+                    platforms=request.platforms,
+                    complete_platforms=request.complete_platforms,
+                    internal_only=request.internal_only,
+                    additional_lockfile_args=request.additional_lockfile_args,
+                ),
+            )
+            if should_return_entire_lockfile:
+                if repository_pex_request.maybe_pex_request is None:
+                    raise ValueError(
+                        "[python].run_against_entire_lockfile was set, but could not find a "
+                        "lockfile or constraints file for this target set. See "
+                        f"{doc_url('python-third-party-dependencies')} for details."
+                    )
+                return repository_pex_request.maybe_pex_request
+
+            repository_pex = await Get(OptionalPex, OptionalPexRequest, repository_pex_request)
+            requirements = dataclasses.replace(
+                requirements, repository_pex=repository_pex.maybe_pex
+            )
+        elif python_setup.enable_resolves:
+            chosen_resolve = await Get(
+                ChosenPythonResolve, ChosenPythonResolveRequest(request.addresses)
+            )
+            requirements = Lockfile(
+                file_path=chosen_resolve.lockfile_path,
+                file_path_description_of_origin=(
+                    f"the resolve `{chosen_resolve.name}` (from `[python].resolves`)"
+                ),
+                resolve_name=chosen_resolve.name,
+                req_strings=requirements.req_strings,
+            )


Most of this is lifted from get_requirements_pex below.

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

src/python/pants/backend/python/util_rules/pex_from_targets.py

stuhood

This looks really great! Thanks.

src/python/pants/backend/python/util_rules/pex_from_targets.py

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano

Huzzah!!! 🎉 I'm impressed how few lines of code this took.

Eric-Arellano · 2022-03-29T23:48:37Z

src/python/pants/backend/python/util_rules/pex_from_targets.py

+                # NB: PEX interprets `--lock` with no `req_strings` as "install the entire lockfile"
+                # And we don't use `req_strings` if the resolve isn't a PEX lockfile.
+                req_strings=FrozenOrderedSet(),


I wish I did a better job naming this :( So confusing. Maybe we wait to rename things until we can delete requirement constraints?

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

) # Background This PR has 2 facets: - Moving most of the `RequirementsPexRequest` rule handling into `PexFromTargetsRequest` rule handling to help streamline where the changes are for this. - Passing the requirement strings on the command-line to PEX in the PEX lockfile case (along with `--lock`) so PEX doesn't need pip to resolve anything. The result of this is that: 1. user's shouldn't have to trade-off `[python].run_against_entire_lockfile` or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around) 2. Users using `--changed` now will only request the dependencies needed for the operation, and will only download what isn't already cached. 3. because of 1. upgrades to Pants which add new arguments to the PEX process invocation, therefore invalidating caches, no longer invalidate the repository PEX. # Metrics On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694) For cold cache, I run the following commands to warm up ``` rm -rf ~/.cache/pants/lmdb_store/ rm -rf ~/.cache/pants/named_caches/ rm -rf .pids ./pants_from_sources help ``` Notes: - 64-core machine - black, (internal copyright formatter), flake8, isort, ❗mypy , and pylint - I'm using `[pex].venv_use_symlinks = true` ## Full lint This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache. `time ./pants_from_sources lint ::` | - | 2.11.x | This PR | | ------------- | ------------- | ------------- | | time - real | 9m48.887s | 5m10.263s | | time - user | 36m7.188s | 53m21.293s | | time - sys | 4m29.392s | 6m18.442s | | space - lmdb_store | 2.3G | 1.4GB | | space - named_caches | 11G | 9.5GB | On 2.11.X, installing `repository.pex` took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant) ## Incremental (--changed) lint This case models developer runs. (Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature) `time ./pants_from_sources lint path/to/a_test_file.py` followed by `time ./pants_from_sources lint different-path/to/a_different_file.py` First file: | - | 2.11.x | This PR | | ------------- | ------------- | ------------- | | time - real | 7m47.071s | 0m40.998s | | time - user | 8m50.894s | 1m24.781s | | time - sys | 1m30.338s | 0m13.899s | | space - lmdb_store | 2.3G | 61M | | space - named_caches | 10G | 383M | Second file: (note this file has a transitive dependency on `mxnet` which is very large) | - | 2.11.x | This PR | | ------------- | ------------- | ------------- | | time - real | 0m16.437s | 2m39.155s | | time - user | 0m22.137s | 3m33.879s | | time - sys | 0m9.119s | 0m20.483s | | space - lmdb_store | 2.3G | 865M | | space - named_caches | 10G | 5.4G | [ci skip-rust]

Make PEX lockfile subsetting incremental

fbcd63a

# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

thejcannon commented Mar 29, 2022

View reviewed changes

thejcannon added 2 commits March 29, 2022 13:51

repl

29b40df

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

comment

59b2cf1

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

thejcannon requested review from Eric-Arellano and stuhood March 29, 2022 18:53

thejcannon added the category:performance label Mar 29, 2022

thejcannon mentioned this pull request Mar 29, 2022

WiP: Incremental subsetting of PEX lock #14923

Closed

thejcannon commented Mar 29, 2022

View reviewed changes

src/python/pants/backend/python/util_rules/pex_from_targets.py Show resolved Hide resolved

stuhood approved these changes Mar 29, 2022

View reviewed changes

src/python/pants/backend/python/util_rules/pex_from_targets.py Show resolved Hide resolved

stuhood added the needs-cherrypick label Mar 29, 2022

stuhood added this to the 2.11.x milestone Mar 29, 2022

docstring

8b3a03f

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano approved these changes Mar 29, 2022

View reviewed changes

thejcannon merged commit 9bb4be1 into pantsbuild:main Mar 30, 2022

thejcannon deleted the pextgtsimpler branch March 30, 2022 00:27

thejcannon mentioned this pull request Mar 30, 2022

Prepare 2.11.0rc1 #14916

Merged

stuhood removed the needs-cherrypick label Mar 30, 2022

stuhood mentioned this pull request Mar 31, 2022

Pants based incremental dependency resolver for Python #14127

Closed

jsirois mentioned this pull request Apr 2, 2022

PEX files are directories instead of files #14980

Closed

thejcannon mentioned this pull request Apr 4, 2022

Pex repository not created if [python].enable_resolves=true and the resolve does not use a PEX lockfile #14991

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make PEX lockfile subsetting incremental #14944

Make PEX lockfile subsetting incremental #14944

thejcannon commented Mar 29, 2022

thejcannon Mar 29, 2022

stuhood left a comment

Eric-Arellano left a comment

Eric-Arellano Mar 29, 2022

thejcannon Mar 30, 2022

Make PEX lockfile subsetting incremental #14944

Make PEX lockfile subsetting incremental #14944

Conversation

thejcannon commented Mar 29, 2022

Background

Metrics

Full lint

Incremental (--changed) lint

thejcannon Mar 29, 2022

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

Eric-Arellano Mar 29, 2022

Choose a reason for hiding this comment

thejcannon Mar 30, 2022

Choose a reason for hiding this comment