-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate local distribution contents once per distribution #14551
Conversation
…nsumer. # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
… extraction. [ci skip-rust] [ci skip-build-wheels]
Commits are useful to review independently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, nice one!
And finger wag at myself for writing the janky code to begin with.
So is the reason the original rule wasn't satisfied from memoization/cache that the |
class LocalDistWheels: | ||
"""Contains the wheels isolated from a single local Python distribution.""" | ||
|
||
wheel_paths: list[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be true that wheel_paths are populated from a Snapshot below which happens to be sorted and it may be true that a set will have a stable order when formed multiple times in the same pantsd Python process - but both not encoding stable order and not encoding immutability in a "frozen" dataclass seems sub-optimal for sanity sake at the least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea... I'll do this for consistency's sake.
But given how tightly we've adhered to functional patterns throughout the system, I do sometimes wonder whether defensively freezing dataclasses is worth it. It's clearly necessary when something will be used in a Get
(for hash
), but not necessary when something is only the return value of a @rule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue if that's true its too subtle. If a rule returns a thing that thing ~must be used by another rule as an input and so you'd expect it better be immutable.
Put it another way, if I have to debug "strange" behavior in the engine, a 1st stop would be to look for violations of patterns in the Python rules. Being able to sanely rely on ~types would be a good 1st stop in that 1st stop. Seeing frozen != frozen would give me pause here without a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it another way, if I have to debug "strange" behavior in the engine, a 1st stop would be to look for violations of patterns in the Python rules. Being able to sanely rely on ~types would be a good 1st stop in that 1st stop. Seeing frozen != frozen would give me pause here without a comment.
Sure. But we have had zero cases of those types of errors in the last few years... and whether that is due more to our defensive freezing of values, or due to mutating of a @rule
-returned value just not being something that is likely/useful is the question that I'm referring to here.
Correct: that, and the |
[ci skip-rust] [ci skip-build-wheels]
@benjyw , @Eric-Arellano , @jsirois : Another thing to think about: rather than computing the transitive source dependencies, and then subtracting the set of transitive wheel dependencies, it would probably be more efficient to collect the transitive dependencies once in a distribution/wheel-vs-sources-aware way as we visited each node. That would be similar to the pattern behind the JVM recursively compiling EDIT: Opened #14561 about this. |
…ld#14551) Currently, local distribution wheel contents are computed once per consumer, rather than once per distribution. Additionally, since the calculation of provided files is using `DigestContents`, it is briefly pulling the entire contents of wheels into memory. For small files, this might be fine: but larger dists can use a lot of memory, particularly in the presence of concurrency. This change moves per-distribution calculations into a separate `@rule` to allow for reuse across multiple consumers, and moves to computing wheel contents using an external process to allow it to be cached run over run. # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…ck of #14551) (#14555) Currently, local distribution wheel contents are computed once per consumer, rather than once per distribution. Additionally, since the calculation of provided files is using `DigestContents`, it is briefly pulling the entire contents of wheels into memory. For small files, this might be fine: but larger dists can use a lot of memory, particularly in the presence of concurrency. This change moves per-distribution calculations into a separate `@rule` to allow for reuse across multiple consumers, and moves to computing wheel contents using an external process to allow it to be cached run over run. [ci skip-rust] [ci skip-build-wheels]
#14551 improved the performance of local dist building when local distributions are actually present. But there are cases (which @benjyw is pursuing) where the `@rule` takes a long time to run, even when no dists are actually present. This is likely to do with the source subtraction: either the calculation of the subset paths, or the execution of `DigestSubset`. [ci skip-rust] [ci skip-build-wheels]
…uild#14564) pantsbuild#14551 improved the performance of local dist building when local distributions are actually present. But there are cases (which @benjyw is pursuing) where the `@rule` takes a long time to run, even when no dists are actually present. This is likely to do with the source subtraction: either the calculation of the subset paths, or the execution of `DigestSubset`. [ci skip-rust] [ci skip-build-wheels]
…pick of #14564) (#14566) #14551 improved the performance of local dist building when local distributions are actually present. But there are cases (which @benjyw is pursuing) where the `@rule` takes a long time to run, even when no dists are actually present. This is likely to do with the source subtraction: either the calculation of the subset paths, or the execution of `DigestSubset`. [ci skip-rust] [ci skip-build-wheels]
…ld#14551) Currently, local distribution wheel contents are computed once per consumer, rather than once per distribution. Additionally, since the calculation of provided files is using `DigestContents`, it is briefly pulling the entire contents of wheels into memory. For small files, this might be fine: but larger dists can use a lot of memory, particularly in the presence of concurrency. This change moves per-distribution calculations into a separate `@rule` to allow for reuse across multiple consumers, and moves to computing wheel contents using an external process to allow it to be cached run over run.
…uild#14564) pantsbuild#14551 improved the performance of local dist building when local distributions are actually present. But there are cases (which @benjyw is pursuing) where the `@rule` takes a long time to run, even when no dists are actually present. This is likely to do with the source subtraction: either the calculation of the subset paths, or the execution of `DigestSubset`. [ci skip-rust] [ci skip-build-wheels]
Currently, local distribution wheel contents are computed once per consumer, rather than once per distribution. Additionally, since the calculation of provided files is using
DigestContents
, it is briefly pulling the entire contents of wheels into memory. For small files, this might be fine: but larger dists can use a lot of memory, particularly in the presence of concurrency.This change moves per-distribution calculations into a separate
@rule
to allow for reuse across multiple consumers, and moves to computing wheel contents using an external process to allow it to be cached run over run.