-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tolerate dependency cycles when using the v2 engine #10046
Conversation
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
I tried getting the cycle detection to work and got the structure of the code, but the performance is twice as bad (13.3 user seconds for The main intuition I had is that we would have a separate resolution for each distinct target root. Why? We need to know the precise path of the cycle. This requires keeping track of each distinct target root, then their dependencies, then their dependencies' dependencies, and so on, so that we can trace the full path. This resulted in this code: @dataclass(frozen=True)
class _TransitiveTargetRequest:
target: Target
@dataclass(frozen=True)
class _TransitiveTarget:
closure: FrozenOrderedSet[Target]
cycles: FrozenOrderedSet[Tuple[Target, ...]]
@rule
async def transitive_target(request: _TransitiveTargetRequest) -> _TransitiveTarget:
count = 0
visited: Dict[int, FrozenOrderedSet[Target]] = {}
queued = FrozenOrderedSet([request.target])
while queued:
visited[count] = queued
count += 1
direct_dependencies = await MultiGet(
Get[Targets](DependenciesRequest(tgt.get(Dependencies))) for tgt in queued
)
queued = FrozenOrderedSet(itertools.chain.from_iterable(direct_dependencies)).difference(
itertools.chain.from_iterable(visited.values())
)
# print({k: [tgt.address.spec for tgt in v] for k, v in visited.items()})
return _TransitiveTarget(
closure=FrozenOrderedSet(itertools.chain.from_iterable(visited.values())),
cycles=FrozenOrderedSet(),
)
@rule
async def transitive_targets(targets: Targets) -> TransitiveTargets:
"""Find all the targets transitively depended upon by the target roots.
This uses iteration, rather than recursion, so that we can tolerate dependency cycles. Unlike a
traditional BFS algorithm, we batch each round of traversals via `MultiGet` for improved
performance / concurrency.
"""
transitive_per_target_root = await MultiGet(
Get[_TransitiveTarget](_TransitiveTargetRequest(tgt)) for tgt in targets
)
return TransitiveTargets(
tuple(targets),
FrozenOrderedSet(
itertools.chain.from_iterable(tt.closure for tt in transitive_per_target_root)
),
) But this results in much worse performance. We need many more iterations and to revisit the same nodes multiple times. -- Instead of having a global option like I like that it avoids adding yet another goal. It also avoids the annoyance of a warning showing up every single time you run |
Stu is working on an alternative design that allows us to still tolerate cycles, but to go through a little more ceremony to do it so that people are very conscious when introducing cycles. |
…in file-addresses. (#10409) ### Problem After more deeply investigating the issues with #10230 (demonstrated in #10393 and its revert), it doesn't seem like the right idea to rely on the engine's cycle detection (which the implementation of #10230 showed should primarily be used for deadlock detection) to expose high level cycle tolerance for the build graph. ### Solution Move to iterative transitive target expansion (à la #10046), and cycle detect afterward using DFS with a stack of visited entries. ### Result Fixes #10059, and closes #10229. A followup will back out portions of #10230 (but not all of it, as there were a number of other improvements mixed in). [ci skip-rust-tests]
Often, languages are able to handle dependency cycles. For example, in Python, you can have import cycles in certain situations.
But, Pants has never worked with cycles between targets, particularly when resolving transitive targets, e.g.
dependencies --transitive
.Turns out, we can tolerate cycles by changing the algorithm to resolve transitive dependencies to use iteration, rather than recursion. The algorithm is inspired by BFS, but uses batching via
MultiGet
for better concurrency.Any rules that use recursion will still fail. Right now, we don't have any. But, when implementing things like Java compilation, we will need to be careful that those too can handle dependency cycles.
Fails with v1
Even if we changed
LegacyTransitiveHydratedTarget
to tolerate cycles, some v1 goals liketest
would still fail because they expect an acyclic graph. So, to keep a good error message + to error more eagerly, we do not touchLegacyTransitiveHydratedTarget
.Remaining followup: warn on cycles
Some codebases may prefer to keep the current behavior of erroring on dep cycles. We should add an option that will allow you to ignore, warn, or error on dep cycles.
To be helpful, this error message must trace the full path of the cycle, e.g.
A great error message would collect all dep cycles before erroring, whereas the previous behavior before this PR would error upon the first cycle encountered.
Result on performance
Running
multitime -n 10 ./v2 --no-enable-pantsd dependencies2 --transitive ::
.Before:
After: