-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WiP: Incremental subsetting of PEX lock #14923
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,8 +27,10 @@ | |
CompletePlatforms, | ||
OptionalPex, | ||
OptionalPexRequest, | ||
Pex, | ||
PexPlatforms, | ||
PexRequest, | ||
_is_probably_pex_json_lockfile, | ||
) | ||
from pants.backend.python.util_rules.pex import rules as pex_rules | ||
from pants.backend.python.util_rules.pex_requirements import Lockfile, PexRequirements | ||
|
@@ -46,6 +48,7 @@ | |
from pants.util.docutil import doc_url | ||
from pants.util.logging import LogLevel | ||
from pants.util.meta import frozen_after_init | ||
from pants.util.ordered_set import FrozenOrderedSet | ||
from pants.util.strutil import bullet_list, path_safe | ||
|
||
logger = logging.getLogger(__name__) | ||
|
@@ -193,6 +196,10 @@ class ChosenPythonResolve: | |
name: str | ||
lockfile_path: str | ||
|
||
@property | ||
def description_of_origin(self) -> str: | ||
return f"the resolve `{self.name}` (from `[python].resolves`)" | ||
|
||
|
||
@dataclass(frozen=True) | ||
class ChosenPythonResolveRequest: | ||
|
@@ -499,11 +506,10 @@ async def get_repository_pex( | |
internal_only=request.internal_only, | ||
requirements=Lockfile( | ||
file_path=chosen_resolve.lockfile_path, | ||
file_path_description_of_origin=( | ||
f"the resolve `{chosen_resolve.name}` (from `[python].resolves`)" | ||
), | ||
file_path_description_of_origin=chosen_resolve.description_of_origin, | ||
resolve_name=chosen_resolve.name, | ||
req_strings=request.requirements.req_strings, | ||
# NB: A blank req_strings means install the entire lockfile | ||
req_strings=FrozenOrderedSet([]), | ||
Comment on lines
+511
to
+512
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be good to document on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, it's only true if the Lockfile is a PEX lockfile AFAICT. 😵 |
||
), | ||
interpreter_constraints=interpreter_constraints, | ||
platforms=request.platforms, | ||
|
@@ -657,5 +663,69 @@ async def get_requirements_pex(request: RequirementsPexRequest, setup: PythonSet | |
return pex_request | ||
|
||
|
||
@dataclass(frozen=True) | ||
class PexReqsRequest: | ||
addresses: Addresses | ||
interpreter_constraints: InterpreterConstraints | None = None | ||
|
||
|
||
|
||
@dataclass(frozen=True) | ||
class PexReqs: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is all hairy, and I apologize. But I think that the result as it stands might be a bit off. AFAICT, the workflows look like:
The differentiation between "a requirements-only PEX" (which contains only thirdparty requirements) and the PEX for user code was intentional (although how important it is in practice is unclear), because thirdparty requirements and your code's transitive thirdparty deps change much less frequently than anything else. So I think that you'll actually want to preserve that aspect, by adjusting the optimization to always create a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the benefit of building any PEXs at all with the new optimization? AFAICT building PEXs here puts us in the same boat as before w.r.t. maybe wanting Unless you mean for goals like Sorry, really confused by this comment 😅
That's what this PR is already doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's not possible to run (a PEX) without building a PEX currently. So in the end, a PEX is built, and then a venv is built from it to actually execute. The primary benefit of building the PEX is that it can be cached: i.e., if you lint or test twice in a row with no code changes, we'll hit the cache for the construction of the PEX, and the only thing that is invalidated is running the built PEX. Building a requirements/third-party PEX independently means that as long as thirdparty requirements have not changed, you can hit the cache for the network-accessing/thirdparty portion of your build. Without that behavior, changes to either first party or third party code will go through PEX's resolution logic. To see the difference (and again: note in my comment that "how important it is in practice is unclear"), you'd want to compare the difference between
Yea, sorry: it's definitely confusing.
Understood: I was trying to explain the behavior difference between From my perspective, by far the most relevant optimization is skipping the creation of the "repository PEX" (which contains the entire lockfile)... it's not clear to me that adjusting the behavior where we currently create a thirdparty-requirements PEX is necessarily a win. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not seeing that behavior.
Then I touch that file and sprinkle newlines
I can share the workunit JSON for proof too. I think maybe the disconnect is we ARE still capturing the third-party reqs separate from first-party user code. Let me profile the reqs.pex way to see if my fear about exploding cache or perf is valid. Then we can compare implementations / side-effects. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like both solutions have similar characteristics due to PEXs hardlinking/symlinking. So I think the difference between the two is now in the noise and we likely might want to switch gears into what we imagine the future state is, and which option best puts us in that direction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the |
||
requirements: Lockfile | PexRequirements | ||
pexes: Iterable[Pex] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most iterables would not be frozen and hashable, so consider using |
||
|
||
|
||
@rule | ||
async def get_lockfile_subset( | ||
request: PexReqsRequest, | ||
python_setup: PythonSetup, | ||
) -> PexReqs: | ||
if python_setup.enable_resolves: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the size of this if-block, consider inverting the condition and having the other branch be the early return. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See #14923 (comment). I think I'll make this a rule. |
||
chosen_resolve = await Get( | ||
ChosenPythonResolve, ChosenPythonResolveRequest(request.addresses) | ||
) | ||
lock_path = chosen_resolve.lockfile_path | ||
requirements_file_digest = await Get( | ||
Digest, | ||
PathGlobs( | ||
[lock_path], | ||
glob_match_error_behavior=GlobMatchErrorBehavior.error, | ||
description_of_origin=chosen_resolve.description_of_origin, | ||
), | ||
) | ||
_digest_contents = await Get(DigestContents, Digest, requirements_file_digest) | ||
lock_bytes = _digest_contents[0].content | ||
Comment on lines
+685
to
+698
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is currently duplicated in the giant rule in |
||
|
||
if _is_probably_pex_json_lockfile(lock_bytes): | ||
if python_setup.run_against_entire_lockfile: | ||
# NB: PEX treats no requirements as "install entire lockfile" | ||
req_strings: FrozenOrderedSet[str] = FrozenOrderedSet() | ||
# @TODO: complain deprecated | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I expect that we should wait at least one release to see whether users who are already using this setting are actually able to stop before we suggest that anyone does. |
||
else: | ||
requirements = await Get( | ||
PexRequirements, _PexRequirementsRequest(request.addresses) | ||
) | ||
req_strings = requirements.req_strings | ||
|
||
lockfile = Lockfile( | ||
file_path=chosen_resolve.lockfile_path, | ||
file_path_description_of_origin=chosen_resolve.description_of_origin, | ||
resolve_name=chosen_resolve.name, | ||
req_strings=req_strings, | ||
) | ||
return PexReqs(lockfile, ()) | ||
|
||
pex = await Get( | ||
Pex, | ||
RequirementsPexRequest( | ||
(request.addresses), | ||
hardcoded_interpreter_constraints=request.interpreter_constraints, | ||
internal_only=True, | ||
), | ||
) | ||
return PexReqs(PexRequirements(), (pex,)) | ||
|
||
|
||
def rules(): | ||
return (*collect_rules(), *pex_rules(), *local_dists_rules(), *python_sources_rules()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably other places this can be leveraged