-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PEX lockfile subsetting incremental #14944
Conversation
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
requirements: PexRequirements | Lockfile = PexRequirements() | ||
if request.include_requirements: | ||
requirements = await Get(PexRequirements, _PexRequirementsRequest(request.addresses)) | ||
|
||
should_return_entire_lockfile = ( | ||
python_setup.run_against_entire_lockfile and request.internal_only | ||
) | ||
should_request_repository_pex = should_return_entire_lockfile or ( | ||
python_setup.resolve_all_constraints and python_setup.requirement_constraints | ||
) | ||
if should_request_repository_pex: | ||
repository_pex_request = await Get( | ||
OptionalPexRequest, | ||
_RepositoryPexRequest( | ||
request.addresses, | ||
hardcoded_interpreter_constraints=request.hardcoded_interpreter_constraints, | ||
platforms=request.platforms, | ||
complete_platforms=request.complete_platforms, | ||
internal_only=request.internal_only, | ||
additional_lockfile_args=request.additional_lockfile_args, | ||
), | ||
) | ||
if should_return_entire_lockfile: | ||
if repository_pex_request.maybe_pex_request is None: | ||
raise ValueError( | ||
"[python].run_against_entire_lockfile was set, but could not find a " | ||
"lockfile or constraints file for this target set. See " | ||
f"{doc_url('python-third-party-dependencies')} for details." | ||
) | ||
return repository_pex_request.maybe_pex_request | ||
|
||
repository_pex = await Get(OptionalPex, OptionalPexRequest, repository_pex_request) | ||
requirements = dataclasses.replace( | ||
requirements, repository_pex=repository_pex.maybe_pex | ||
) | ||
elif python_setup.enable_resolves: | ||
chosen_resolve = await Get( | ||
ChosenPythonResolve, ChosenPythonResolveRequest(request.addresses) | ||
) | ||
requirements = Lockfile( | ||
file_path=chosen_resolve.lockfile_path, | ||
file_path_description_of_origin=( | ||
f"the resolve `{chosen_resolve.name}` (from `[python].resolves`)" | ||
), | ||
resolve_name=chosen_resolve.name, | ||
req_strings=requirements.req_strings, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this is lifted from get_requirements_pex
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huzzah!!! 🎉 I'm impressed how few lines of code this took.
# NB: PEX interprets `--lock` with no `req_strings` as "install the entire lockfile" | ||
# And we don't use `req_strings` if the resolve isn't a PEX lockfile. | ||
req_strings=FrozenOrderedSet(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I did a better job naming this :( So confusing. Maybe we wait to rename things until we can delete requirement constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
) # Background This PR has 2 facets: - Moving most of the `RequirementsPexRequest` rule handling into `PexFromTargetsRequest` rule handling to help streamline where the changes are for this. - Passing the requirement strings on the command-line to PEX in the PEX lockfile case (along with `--lock`) so PEX doesn't need pip to resolve anything. The result of this is that: 1. user's shouldn't have to trade-off `[python].run_against_entire_lockfile` or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around) 2. Users using `--changed` now will only request the dependencies needed for the operation, and will only download what isn't already cached. 3. because of 1. upgrades to Pants which add new arguments to the PEX process invocation, therefore invalidating caches, no longer invalidate the repository PEX. # Metrics On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694) For cold cache, I run the following commands to warm up ``` rm -rf ~/.cache/pants/lmdb_store/ rm -rf ~/.cache/pants/named_caches/ rm -rf .pids ./pants_from_sources help ``` Notes: - 64-core machine - black, (internal copyright formatter), flake8, isort, ❗mypy , and pylint - I'm using `[pex].venv_use_symlinks = true` ## Full lint This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache. `time ./pants_from_sources lint ::` | - | 2.11.x | This PR | | ------------- | ------------- | ------------- | | time - real | 9m48.887s | 5m10.263s | | time - user | 36m7.188s | 53m21.293s | | time - sys | 4m29.392s | 6m18.442s | | space - lmdb_store | 2.3G | 1.4GB | | space - named_caches | 11G | 9.5GB | On 2.11.X, installing `repository.pex` took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant) ## Incremental (--changed) lint This case models developer runs. (Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature) `time ./pants_from_sources lint path/to/a_test_file.py` followed by `time ./pants_from_sources lint different-path/to/a_different_file.py` First file: | - | 2.11.x | This PR | | ------------- | ------------- | ------------- | | time - real | 7m47.071s | 0m40.998s | | time - user | 8m50.894s | 1m24.781s | | time - sys | 1m30.338s | 0m13.899s | | space - lmdb_store | 2.3G | 61M | | space - named_caches | 10G | 383M | Second file: (note this file has a transitive dependency on `mxnet` which is very large) | - | 2.11.x | This PR | | ------------- | ------------- | ------------- | | time - real | 0m16.437s | 2m39.155s | | time - user | 0m22.137s | 3m33.879s | | time - sys | 0m9.119s | 0m20.483s | | space - lmdb_store | 2.3G | 865M | | space - named_caches | 10G | 5.4G | [ci skip-rust]
Background
This PR has 2 facets:
RequirementsPexRequest
rule handling intoPexFromTargetsRequest
rule handling to help streamline where the changes are for this.--lock
) so PEX doesn't need pip to resolve anything.The result of this is that:
[python].run_against_entire_lockfile
or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around)--changed
now will only request the dependencies needed for the operation, and will only download what isn't already cached.Metrics
On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694)
For cold cache, I run the following commands to warm up
Notes:
[pex].venv_use_symlinks = true
Full lint
This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache.
time ./pants_from_sources lint ::
On 2.11.X, installing
repository.pex
took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant)Incremental (--changed) lint
This case models developer runs.
(Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature)
time ./pants_from_sources lint path/to/a_test_file.py
followed by
time ./pants_from_sources lint different-path/to/a_different_file.py
First file:
Second file:
(note this file has a transitive dependency on
mxnet
which is very large)[ci skip-rust]