Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of TransitiveTargets/CoarsenedTargets #11270

Open
stuhood opened this issue Dec 8, 2020 · 6 comments · May be fixed by #17394
Open

Improve performance of TransitiveTargets/CoarsenedTargets #11270

stuhood opened this issue Dec 8, 2020 · 6 comments · May be fixed by #17394
Assignees
Labels

Comments

@stuhood
Copy link
Member

stuhood commented Dec 8, 2020

When #10409 introduced cycle tolerance, it also removed structure sharing of TransitiveTarget instances. This meant that if you made 100 TransitiveTargets requests, much less of their work could be reused, and in particular, cycle detection could not be reused. Cycle detection is cheap enough to do for one concurrent instance, but not to do hundreds of times redundantly.

@stuhood
Copy link
Member Author

stuhood commented Dec 8, 2020

Unfortunately, this is blocked by some rule graph craziness that I'd like to lean in to fix via #11269. In the meantime, I'm going to revert the original fix for #11201, and raise the recursion limit.

stuhood added a commit that referenced this issue Dec 8, 2020
#11202 introduced a lot of contention on the GIL that was only obvious when larger numbers of targets were being built. #11270 is a holistic fix for that issue, but it is currently blocked on #11269.

In the meantime, we will land #11271 to dodge the original issue in #11201 to get us back to a medium-slow-but-working state, and then follow up on #11269 and #11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Dec 8, 2020
…sbuild#11272)

In the meantime, we will land pantsbuild#11271 to dodge the original issue in pantsbuild#11201 to get us back to a medium-slow-but-working state, and then follow up on pantsbuild#11269 and pantsbuild#11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Dec 8, 2020
…sbuild#11272)

In the meantime, we will land pantsbuild#11271 to dodge the original issue in pantsbuild#11201 to get us back to a medium-slow-but-working state, and then follow up on pantsbuild#11269 and pantsbuild#11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Dec 9, 2020
…sbuild#11272)

In the meantime, we will land pantsbuild#11271 to dodge the original issue in pantsbuild#11201 to get us back to a medium-slow-but-working state, and then follow up on pantsbuild#11269 and pantsbuild#11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Dec 9, 2020
…sbuild#11272)

In the meantime, we will land pantsbuild#11271 to dodge the original issue in pantsbuild#11201 to get us back to a medium-slow-but-working state, and then follow up on pantsbuild#11269 and pantsbuild#11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Dec 9, 2020
…11272) (#11277)

#11202 introduced a lot of contention on the GIL that was only obvious when larger numbers of targets were being built. #11270 is a holistic fix for that issue, but it is currently blocked on #11269.

In the meantime, we will land #11271 to dodge the original issue in #11201 to get us back to a medium-slow-but-working state, and then follow up on #11269 and #11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Dec 9, 2020
…11272) (#11275)

#11202 introduced a lot of contention on the GIL that was only obvious when larger numbers of targets were being built. #11270 is a holistic fix for that issue, but it is currently blocked on #11269.

In the meantime, we will land #11271 to dodge the original issue in #11201 to get us back to a medium-slow-but-working state, and then follow up on #11269 and #11270 to get us to the best place.

This reverts commit fabcb45.

[ci skip-build-wheels]
@benjyw benjyw removed the performance label Sep 9, 2021
@stuhood
Copy link
Member Author

stuhood commented Feb 8, 2022

I believe that rather than necessarily internally structure-sharing for multiple distinct TransitiveTargets requests, there might be another solution based on adjusting TransitiveTargets to be more like CoarsenedTargets... possibly by becoming a wrapper around CoarsenedTargets, but also possibly by just becoming a fully expanded map of targets similar to _DependencyMapping.

Rather than making a TransitiveTargets request per root target, you would make a request for all root targets, and then use methods of the return type to partition and recurse on any number of roots.

@Eric-Arellano
Copy link
Contributor

How hard would that be to do? I suspect it will make a tangible impact on Pylint and MyPy -- the dedicated rule for partitioning seems to take at ~1 second on my very fast M1 in a not very huge repository..

@stuhood
Copy link
Member Author

stuhood commented Feb 8, 2022

How hard would that be to do?

Not too hard? I suspect that the biggest challenge will be updating all of the relevant callsites to avoid making lots of requests in parallel. But that could be incremental, since TT could continue to expose its existing methods.

stuhood added a commit that referenced this issue Apr 25, 2022
`CoarsenedTarget`s are structure shared, and because they preserve their internal structure, they can service requests for transitive targets for different roots from the same datastructure. Concretely: Mypy and Pylint can consume `CoarsenedTargets` to execute a single `@rule`-level graph walk, and then compute per-root closures from the resulting `CoarsenedTarget` instances.

This does not address #11270 in a general way (and it punts on #15241, which means that we still need per-root transitive walks), but it might provide a prototypical way to solve that problem on a case-by-case basis.

Performance wise, this moves cold `check ::` for ~1k files from:
* `main`: 32s total, and 26s spent in partitioning
*  `branch`: 19s total, and 13s spent in partitioning

The rest of the time is wrapped up in #15241.
stuhood added a commit to stuhood/pants that referenced this issue Apr 25, 2022
…d#15141)

`CoarsenedTarget`s are structure shared, and because they preserve their internal structure, they can service requests for transitive targets for different roots from the same datastructure. Concretely: Mypy and Pylint can consume `CoarsenedTargets` to execute a single `@rule`-level graph walk, and then compute per-root closures from the resulting `CoarsenedTarget` instances.

This does not address pantsbuild#11270 in a general way (and it punts on pantsbuild#15241, which means that we still need per-root transitive walks), but it might provide a prototypical way to solve that problem on a case-by-case basis.

Performance wise, this moves cold `check ::` for ~1k files from:
* `main`: 32s total, and 26s spent in partitioning
*  `branch`: 19s total, and 13s spent in partitioning

The rest of the time is wrapped up in pantsbuild#15241.
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
stuhood added a commit that referenced this issue Apr 25, 2022
…ck of #15141) (#15244)

`CoarsenedTarget`s are structure shared, and because they preserve their internal structure, they can service requests for transitive targets for different roots from the same datastructure. Concretely: Mypy and Pylint can consume `CoarsenedTargets` to execute a single `@rule`-level graph walk, and then compute per-root closures from the resulting `CoarsenedTarget` instances.

This does not address #11270 in a general way (and it punts on #15241, which means that we still need per-root transitive walks), but it might provide a prototypical way to solve that problem on a case-by-case basis.

Performance wise, this moves cold `check ::` for ~1k files from:
* `main`: 32s total, and 26s spent in partitioning
*  `branch`: 19s total, and 13s spent in partitioning

The rest of the time is wrapped up in #15241.
@stuhood stuhood changed the title Re-introduce structure sharing for TransitiveTargets Re-introduce structure sharing for TransitiveTargets/CoarsenedTargets Jun 8, 2022
@stuhood
Copy link
Member Author

stuhood commented Jun 8, 2022

Going to revisit this one.

#15141 fully fixed mypy since it is always monolithic, and not broken up into batches. But when pylint is broken up into batches (due to the default --lint-batch-size=128 option), it ends up making one request for CoarsenedTargets per batch (which is effectively equivalent to having made one TransitiveTargets request per batch: hence the title change for this issue).

I'm going to spend a little bit of time thinking through whether there is a holistic solution to avoid needing to change APIs. But in the absence of that, adjusting both the lint and test APIs to take CoarsenedTargets for their roots would mean that we did a single @rule-graph level graph walk for the entire goal, rather than doing one per lint-batch or test.

@stuhood stuhood self-assigned this Jun 8, 2022
@stuhood stuhood changed the title Re-introduce structure sharing for TransitiveTargets/CoarsenedTargets Improve performance of TransitiveTargets/CoarsenedTargets Jun 17, 2022
@stuhood
Copy link
Member Author

stuhood commented Jun 18, 2022

I gave this one some thought last week, and came to the conclusion that while structure sharing is still helpful, recursive memoization is just not feasible in the presence of graph cycles. Which is part of why when we added support for cycles, we did so by moving TT/CT calculations from recursion to iteration.

But iteration is also memoizable: usually via a map/dict on the stack of your iteration function. Unlike recursion though, it isn't referentially transparent, and so requires (visible) mutability.

So: I think that by adding a new feature to the Engine/Graph to mark a @rule output as mutable (either by adding a new field to EngineAwareReturnType, or by marking the @rule itself somehow), we can memoize the iteration of transitive dependency computation, by consuming a mutable map of structure-shared TransitiveTarget instances.

Supporting mutable state in the Graph seemingly amounts to adding all of the dependencies of any Graph Node which depends on the mutable state to the mutable state (or, equivalently: putting the consumer Node into a cycle with the mutable state it is consuming, such that either Node being invalidated would invalidate the other). This accounts for the fact that a @rule might put any of its inputs into the mutable state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
3 participants