Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PEX lockfile subsetting incremental #14944

Merged
merged 4 commits into from
Mar 30, 2022

Conversation

thejcannon
Copy link
Member

Background

This PR has 2 facets:

  • Moving most of the RequirementsPexRequest rule handling into PexFromTargetsRequest rule handling to help streamline where the changes are for this.
  • Passing the requirement strings on the command-line to PEX in the PEX lockfile case (along with --lock) so PEX doesn't need pip to resolve anything.

The result of this is that:

  1. user's shouldn't have to trade-off [python].run_against_entire_lockfile or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around)
  2. Users using --changed now will only request the dependencies needed for the operation, and will only download what isn't already cached.
  3. because of 1. upgrades to Pants which add new arguments to the PEX process invocation, therefore invalidating caches, no longer invalidate the repository PEX.

Metrics

On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694)

For cold cache, I run the following commands to warm up

rm -rf ~/.cache/pants/lmdb_store/
rm -rf ~/.cache/pants/named_caches/
rm -rf .pids
./pants_from_sources help

Notes:

  • 64-core machine
  • black, (internal copyright formatter), flake8, isort, ❗mypy , and pylint
  • I'm using [pex].venv_use_symlinks = true

Full lint

This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache.

time ./pants_from_sources lint ::

- 2.11.x This PR
time - real 9m48.887s 5m10.263s
time - user 36m7.188s 53m21.293s
time - sys 4m29.392s 6m18.442s
space - lmdb_store 2.3G 1.4GB
space - named_caches 11G 9.5GB

On 2.11.X, installing repository.pex took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant)

Incremental (--changed) lint

This case models developer runs.

(Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature)
time ./pants_from_sources lint path/to/a_test_file.py
followed by time ./pants_from_sources lint different-path/to/a_different_file.py

First file:

- 2.11.x This PR
time - real 7m47.071s 0m40.998s
time - user 8m50.894s 1m24.781s
time - sys 1m30.338s 0m13.899s
space - lmdb_store 2.3G 61M
space - named_caches 10G 383M

Second file:
(note this file has a transitive dependency on mxnet which is very large)

- 2.11.x This PR
time - real 0m16.437s 2m39.155s
time - user 0m22.137s 3m33.879s
time - sys 0m9.119s 0m20.483s
space - lmdb_store 2.3G 865M
space - named_caches 10G 5.4G

[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Comment on lines +380 to +426
requirements: PexRequirements | Lockfile = PexRequirements()
if request.include_requirements:
requirements = await Get(PexRequirements, _PexRequirementsRequest(request.addresses))

should_return_entire_lockfile = (
python_setup.run_against_entire_lockfile and request.internal_only
)
should_request_repository_pex = should_return_entire_lockfile or (
python_setup.resolve_all_constraints and python_setup.requirement_constraints
)
if should_request_repository_pex:
repository_pex_request = await Get(
OptionalPexRequest,
_RepositoryPexRequest(
request.addresses,
hardcoded_interpreter_constraints=request.hardcoded_interpreter_constraints,
platforms=request.platforms,
complete_platforms=request.complete_platforms,
internal_only=request.internal_only,
additional_lockfile_args=request.additional_lockfile_args,
),
)
if should_return_entire_lockfile:
if repository_pex_request.maybe_pex_request is None:
raise ValueError(
"[python].run_against_entire_lockfile was set, but could not find a "
"lockfile or constraints file for this target set. See "
f"{doc_url('python-third-party-dependencies')} for details."
)
return repository_pex_request.maybe_pex_request

repository_pex = await Get(OptionalPex, OptionalPexRequest, repository_pex_request)
requirements = dataclasses.replace(
requirements, repository_pex=repository_pex.maybe_pex
)
elif python_setup.enable_resolves:
chosen_resolve = await Get(
ChosenPythonResolve, ChosenPythonResolveRequest(request.addresses)
)
requirements = Lockfile(
file_path=chosen_resolve.lockfile_path,
file_path_description_of_origin=(
f"the resolve `{chosen_resolve.name}` (from `[python].resolves`)"
),
resolve_name=chosen_resolve.name,
req_strings=requirements.req_strings,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this is lifted from get_requirements_pex below.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! Thanks.

@stuhood stuhood added this to the 2.11.x milestone Mar 29, 2022
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huzzah!!! 🎉 I'm impressed how few lines of code this took.

Comment on lines +534 to +536
# NB: PEX interprets `--lock` with no `req_strings` as "install the entire lockfile"
# And we don't use `req_strings` if the resolve isn't a PEX lockfile.
req_strings=FrozenOrderedSet(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I did a better job naming this :( So confusing. Maybe we wait to rename things until we can delete requirement constraints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur

@thejcannon thejcannon merged commit 9bb4be1 into pantsbuild:main Mar 30, 2022
@thejcannon thejcannon deleted the pextgtsimpler branch March 30, 2022 00:27
@thejcannon thejcannon mentioned this pull request Mar 30, 2022
thejcannon added a commit to thejcannon/pants that referenced this pull request Mar 30, 2022
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
thejcannon added a commit that referenced this pull request Mar 30, 2022
)

# Background

This PR has 2 facets:
- Moving most of the `RequirementsPexRequest` rule handling into `PexFromTargetsRequest` rule handling to help streamline where the changes are for this.
- Passing the requirement strings on the command-line to PEX in the PEX lockfile case (along with `--lock`) so PEX doesn't need pip to resolve anything.

The result of this is that:
1. user's shouldn't have to trade-off `[python].run_against_entire_lockfile` or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around)
2. Users using `--changed` now will only request the dependencies needed for the operation, and will only download what isn't already cached.
3. because of 1. upgrades to Pants which  add new arguments to the PEX process invocation, therefore invalidating caches, no longer invalidate the repository PEX.

# Metrics

On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694)

For cold cache, I run the following commands to warm up
```
rm -rf ~/.cache/pants/lmdb_store/
rm -rf ~/.cache/pants/named_caches/
rm -rf .pids
./pants_from_sources help
```

Notes:
- 64-core machine
- black, (internal copyright formatter), flake8, isort, ❗mypy , and pylint
- I'm using `[pex].venv_use_symlinks = true`

## Full lint

This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache.

`time ./pants_from_sources lint ::`

| - | 2.11.x  | This PR |
| -------------  | ------------- | ------------- |
| time - real  | 9m48.887s | 5m10.263s  |
| time - user |    36m7.188s | 53m21.293s  |
| time - sys |     4m29.392s  | 6m18.442s  |
| space - lmdb_store | 2.3G | 1.4GB |
| space - named_caches | 11G | 9.5GB |

On 2.11.X, installing `repository.pex` took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant)

## Incremental (--changed) lint

This case models developer runs.

(Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature)
`time ./pants_from_sources lint path/to/a_test_file.py`
followed by `time ./pants_from_sources lint different-path/to/a_different_file.py`

First file:

| - | 2.11.x | This PR  |
| -------------  | ------------- | ------------- |
| time - real  | 7m47.071s |  0m40.998s |
| time - user |    8m50.894s |  1m24.781s |
| time - sys |     1m30.338s  | 0m13.899s  |
| space - lmdb_store | 2.3G | 61M |
| space - named_caches | 10G | 383M |

Second file:
(note this file has a transitive dependency on `mxnet` which is very large)

| - | 2.11.x  | This PR  |
| -------------  | ------------- | ------------- |
| time - real  | 0m16.437s |  2m39.155s |
| time - user | 0m22.137s |  3m33.879s |
| time - sys | 0m9.119s  |  0m20.483s |
| space - lmdb_store | 2.3G | 865M |
| space - named_caches | 10G | 5.4G |

[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants