Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP: Incremental subsetting of PEX lock #14923

Closed
wants to merge 2 commits into from

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Mar 28, 2022

(Right now this PR targets 2.11.x, when it's ready I'll re-target main)

Background

This PR has 2 facets:

  • Moving from building a requirements.pex (or repository.pex) to just passing the requirements to PEX when building the final "tool" venv PEX (generally tool_runner.pex).
  • Passing the requirement strings on the command-line to PEX in the PEX lockfile case (along with --lock) so PEX doesn't need pip to resolve anything.

The result of this is that:

  1. user's no longer have to trade-off [python].run_against_entire_lockfile or not. It was always be more space and time performant this way (the thing to remember here is PEX will use symlinks (if told) or hardlinks to ensure we aren't copying the giant wheels around)
  2. Users using --changed now will only request the dependencies needed for the operation, and will only download what isn't already cached.
  3. because of 1. upgrades to Pants which add new arguments to the PEX process invocation, therefore invalidating caches, no longer invalidate the repository PEX.

Metrics

On 2.11.x (96d5178) Vs this PR (+ pex-tool/pex#1694)

For cold cache, I run the following commands to warm up

rm -rf ~/.cache/pants/lmdb_store/
rm -rf ~/.cache/pants/named_caches/
rm -rf .pids
./pants_from_sources help

Notes:

  • 64-core machine
  • black, (internal copyright formatter), flake8, isort, ❗mypy , and pylint
  • I'm using [pex].venv_use_symlinks = true

Full lint

This case models a CI run, which might be on a completely cold machine (case-and-point this repo's GHA). It also models upgrades to Pants' version where the PEX args differ, as well as updates to the lockfile, both invalidating the cache.

time ./pants_from_sources lint ::

- 2.11.x This PR
time - real 9m48.887s 5m10.263s
time - user 36m7.188s 53m21.293s
time - sys 4m29.392s 6m18.442s
space - lmdb_store 2.3G 1.4GB
space - named_caches 11G 9.5GB

On 2.11.X, installing repository.pex took 8m24s seconds, and acts like a giant tentpole (#11303 and #14886 are very relevant)

Incremental (--changed) lint

This case models developer runs.

(Since I'm not changing any files in my branch, I'll just ask to lint a single file. Then ask to lint a completely different file to model the "incremental" nature)
time ./pants_from_sources lint path/to/a_test_file.py
followed by time ./pants_from_sources lint different-path/to/a_different_file.py

First file:

- 2.11.x This PR
time - real 7m47.071s 0m40.998s
time - user 8m50.894s 1m24.781s
time - sys 1m30.338s 0m13.899s
space - lmdb_store 2.3G 61M
space - named_caches 10G 383M

Second file:
(note this file has a transitive dependency on mxnet which is very large)

- 2.11.x This PR
time - real 0m16.437s 2m39.155s
time - user 0m22.137s 3m33.879s
time - sys 0m9.119s 0m20.483s
space - lmdb_store 2.3G 865M
space - named_caches 10G 5.4G

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@thejcannon thejcannon changed the title WiP: subsetting using PEX lock WiP: Incremental subsetting of PEX lock Mar 28, 2022
@thejcannon
Copy link
Member Author

Open question: Should we do this as well for the "other" cases (non-PEX lockfiles)?



@dataclass(frozen=True)
class PexReqs:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@stuhood stuhood Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all hairy, and I apologize. But I think that the result as it stands might be a bit off. AFAICT, the workflows look like:

  • If we are using the new optimization:
    1. Don't actually build a requirements-only PEX, and instead hold onto the requirements which would have been used
    2. When actually creating a PEX containing user code, embed the thirdparty requirements
  • If we are not using the optimization:
    1. Build a requirements-only PEX, and put it on the pex-path of another PEX containing user code.

The differentiation between "a requirements-only PEX" (which contains only thirdparty requirements) and the PEX for user code was intentional (although how important it is in practice is unclear), because thirdparty requirements and your code's transitive thirdparty deps change much less frequently than anything else.

So I think that you'll actually want to preserve that aspect, by adjusting the optimization to always create a RequirementsPex, but to do so directly from the Lockfile, rather than from a "repository PEX". That should (AFAICT?) avoid the need for the new type at all, because the existing RequirementsPex creation code can be adjusted directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of building any PEXs at all with the new optimization? AFAICT building PEXs here puts us in the same boat as before w.r.t. maybe wanting run_against_entire_lockfile, because for each unique subset of 3rdparty deps we'll be building a PEX and that's non-ideal.

Unless you mean for goals like package? I'm pretty sure that code has no different behavior with this PR (although we should probably change that to do bullet point 1, subpoint ii. Today it might be PEXing the whole repo)

Sorry, really confused by this comment 😅


Don't actually build a requirements-only PEX, and instead hold onto the requirements which would have been used
...
Build a requirements-only PEX, and put it on the pex-path of another PEX containing user code.

That's what this PR is already doing

Copy link
Member

@stuhood stuhood Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of building any PEXs at all with the new optimization? AFAICT building PEXs here puts us in the same boat as before w.r.t. maybe wanting run_against_entire_lockfile, because for each unique subset of 3rdparty deps we'll be building a PEX and that's non-ideal.

It's not possible to run (a PEX) without building a PEX currently. So in the end, a PEX is built, and then a venv is built from it to actually execute. The primary benefit of building the PEX is that it can be cached: i.e., if you lint or test twice in a row with no code changes, we'll hit the cache for the construction of the PEX, and the only thing that is invalidated is running the built PEX.

Building a requirements/third-party PEX independently means that as long as thirdparty requirements have not changed, you can hit the cache for the network-accessing/thirdparty portion of your build. Without that behavior, changes to either first party or third party code will go through PEX's resolution logic.

To see the difference (and again: note in my comment that "how important it is in practice is unclear"), you'd want to compare the difference between main and this branch for a case like: "I ran a test once, made a non-import edit to my test code, and then re-ran". On main, that edit won't actually affect any requirements, and so won't need to hit the network / run resolution logic / consume the lockfile at all: on this branch, it will re-run the PEX resolve. Warm runs like this are an important case for local latency, and we need to account for them (even at the expense of CI / cold performance sometimes).

Sorry, really confused by this comment 😅

Yea, sorry: it's definitely confusing.

Don't actually build a requirements-only PEX, and instead hold onto the requirements which would have been used
...
Build a requirements-only PEX, and put it on the pex-path of another PEX containing user code.

That's what this PR is already doing

Understood: I was trying to explain the behavior difference between main and this branch.

From my perspective, by far the most relevant optimization is skipping the creation of the "repository PEX" (which contains the entire lockfile)... it's not clear to me that adjusting the behavior where we currently create a thirdparty-requirements PEX is necessarily a win.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main, that edit won't actually affect any requirements, and so won't need to hit the network / run resolution logic / consume the lockfile at all: on this branch, it will re-run the PEX resolve.

I'm not seeing that behavior.

joshuacannon@CEPHANDRIUS:~/work/techlabs$ ./pants_from_sources test path/to/a/test_file.py
19:28:19.52 [INFO] Completed: Building pytest.pex from 3rdparty/python/lockfiles/pytest.txt                                                                                 
19:28:19.92 [INFO] Completed: Building local_dists.pex                                                                                                                      
19:28:21.69 [INFO] Completed: Building pytest_runner.pex from 3rdparty/python/lockfiles/repository.lock                                                                     
19:28:22.14 [WARN] Failed to generate JUnit XML data for path/to/a/test_file.py.                                                                                 
19:28:22.14 [ERROR] Completed: Run Pytest - path/to/a/test_file.py failed (exit code 1).

Then I touch that file and sprinkle newlines

joshuacannon@CEPHANDRIUS:~/work/techlabs$ ./pants_from_sources test path/to/a/test_file.py
19:31:27.77 [WARN] Failed to generate JUnit XML data for path/to/a/test_file.py.                                                                                 
19:31:27.77 [ERROR] Completed: Run Pytest - path/to/a/test_file.py failed (exit code 1).

I can share the workunit JSON for proof too.


I think maybe the disconnect is we ARE still capturing the third-party reqs separate from first-party user code. pytest_runner.pex is a VenvPEX we're building which won't contain the user code. Same for pylint and mypy.

Let me profile the reqs.pex way to see if my fear about exploding cache or perf is valid. Then we can compare implementations / side-effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like both solutions have similar characteristics due to PEXs hardlinking/symlinking. So I think the difference between the two is now in the noise and we likely might want to switch gears into what we imagine the future state is, and which option best puts us in that direction.

Copy link
Member Author

@thejcannon thejcannon Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the requirements.pex way doesn't involve touching client-code, I'm actually leaning that direction. Thanks for the push in the right direction @stuhood

Comment on lines +685 to +698
chosen_resolve = await Get(
ChosenPythonResolve, ChosenPythonResolveRequest(request.addresses)
)
lock_path = chosen_resolve.lockfile_path
requirements_file_digest = await Get(
Digest,
PathGlobs(
[lock_path],
glob_match_error_behavior=GlobMatchErrorBehavior.error,
description_of_origin=chosen_resolve.description_of_origin,
),
)
_digest_contents = await Get(DigestContents, Digest, requirements_file_digest)
lock_bytes = _digest_contents[0].content
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently duplicated in the giant rule in pex.py currently 😞 Should we make a new rule?

Comment on lines +199 to +201
@property
def description_of_origin(self) -> str:
return f"the resolve `{self.name}` (from `[python].resolves`)"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably other places this can be leveraged

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for doing this! I think that we should probably preserve the "create a RequirementsPex independent of the user code" behavior, but it seems like that might actually simplify things a bit and avoid the need for a new type.

Comment on lines +511 to +512
# NB: A blank req_strings means install the entire lockfile
req_strings=FrozenOrderedSet([]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be good to document on the Lockfile class rather-than/in-addition-to here. We haven't done a great job with docstrings on these internal APIs, but this in particular could be surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's only true if the Lockfile is a PEX lockfile AFAICT. 😵

@dataclass(frozen=True)
class PexReqs:
requirements: Lockfile | PexRequirements
pexes: Iterable[Pex]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most iterables would not be frozen and hashable, so consider using tuple here instead.



@dataclass(frozen=True)
class PexReqs:
Copy link
Member

@stuhood stuhood Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all hairy, and I apologize. But I think that the result as it stands might be a bit off. AFAICT, the workflows look like:

  • If we are using the new optimization:
    1. Don't actually build a requirements-only PEX, and instead hold onto the requirements which would have been used
    2. When actually creating a PEX containing user code, embed the thirdparty requirements
  • If we are not using the optimization:
    1. Build a requirements-only PEX, and put it on the pex-path of another PEX containing user code.

The differentiation between "a requirements-only PEX" (which contains only thirdparty requirements) and the PEX for user code was intentional (although how important it is in practice is unclear), because thirdparty requirements and your code's transitive thirdparty deps change much less frequently than anything else.

So I think that you'll actually want to preserve that aspect, by adjusting the optimization to always create a RequirementsPex, but to do so directly from the Lockfile, rather than from a "repository PEX". That should (AFAICT?) avoid the need for the new type at all, because the existing RequirementsPex creation code can be adjusted directly.

if python_setup.run_against_entire_lockfile:
# NB: PEX treats no requirements as "install entire lockfile"
req_strings: FrozenOrderedSet[str] = FrozenOrderedSet()
# @TODO: complain deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that we should wait at least one release to see whether users who are already using this setting are actually able to stop before we suggest that anyone does.

request: PexReqsRequest,
python_setup: PythonSetup,
) -> PexReqs:
if python_setup.enable_resolves:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the size of this if-block, consider inverting the condition and having the other branch be the early return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #14923 (comment). I think I'll make this a rule.

@thejcannon thejcannon closed this Mar 29, 2022
@thejcannon
Copy link
Member Author

Successor: #14944

@thejcannon thejcannon deleted the lockspeedy branch August 17, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants