-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly. #5168
Conversation
r? @matklad (rust_highfive has picked a reviewer for you, use r? to override) |
I hear you saying "but does this slow the happy path?" It should not and it doesn't for me, but I have been wrong before so please test with |
Thanks so much for this @Eh2406, this is looking fantastic! I've got a few high level questions and a more meta question. The higher level questions are:
The performance here looks like we won't take too much of a hit on the fast path I agree. The major added cost I can tell is:
But otherwise the only remaining question I'd have is that I've only got a tenuous grasp at best on what's happening here. Mind expanding the doc comments at the top or the comments inline as to what's happening? I've personally found that "comment stories" which walk through examples can be quite helpful! |
I just pushed two small performance gains, and an improvement to the test. The performance gains make sure we deal with known errors ferst. The test now checks with a successful and a failed resolve. I will work on comments next time I have a minute. |
Looks and sounds great to me, thanks so much again @Eh2406! |
If you happen to have time... If you have leading questions then I will endeavour to ensure that the comments I add address them. If not I will try and guess at what is confusing to someone new to this mess. |
Oh certainly! In general I think it'd be good to get a refreshed high-level overview with what the resolver is doing, particularly with respect to backtracking. Currently (AFAIK) there's two primary optimizations when backtracking:
I think that'd be more than enough for now, but eventually I'd also love to have a scenario or two in the documentation and then in the code it can mention "we get here in scenario 1 when we do this work..." or things like that (but that can always happen later) |
I added two more commits. One extended the test to include cases where the redundant backtracking is caused by previous activations. The other was a small performance gain in which we only disable this optimization if the backtracking will get to the user. I will try to work on comments. At some point we may want to restructure / add new types / helper functions so that the random comments can become dock comments. (Editor's note: this got kinda long, I think the first draft ended up in this comment. :-) ) I think you describe @aidanhs's optimization well with one small but relevant caveat. We will zoom all the way to the point that Let's say that the dep we can't find a candidate for is But this is a lital unsatisfactory. Probably all versions of We can leave a note:
and we continue on our way. If we find a resolution then making a note is not much work. But next time we try and activate
and we continue on our way. If we find a resolution then making a note is not much work. But next time we try and activate ... |
@Eh2406 ah yes indeed, and your explanation sounds great! I think it'd also be great to put links to these PR discussions. Whenever someone feels the need to understand the resolver they're probably in for the long haul (aka willing to read a lot) anyway, and the discussions on GitHub I've found are treasure troves of information :) |
I have been working hard to add lots of links in my PRs to hopefully make it possible for someone in the future to find all the discussions. I pushed a commit with a bunch more comments. I hope I did not go to far in over commenting. Let me know what you think. If there are more questions, I will do my best to answer. If you see how this would be slow or would use to much ram with a different test, I'd love to know. Note that the benchmark from #4810 (comment) is largely unaffected by this PR. I have several small changes to this that do fix that case. Some are almost acceptable, but I don't have an isolated independent test. So that will for the next PR. |
📌 Commit 58d7cf1 has been approved by |
Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly. This work is inspired by @alexcrichton's [comment](#4066 (comment)) that a slow resolver can be caused by all versions of a dependency being yanked. Witch stuck in my brain as I did not understand why it would happen. If a dependency has no candidates then it will be the most constrained and will trigger backtracking in the next tick. Eventually I found a reproducible test case. If the bad dependency is deep in the tree of dependencies then we activate and backtrack `O(versions^depth)` times. Even tho it is fast to identify the problem that is a lot of work. **The set up:** 1. Every time we backtrack cache the (dep, `conflicting_activations`). 2. Build on the work in #5000, Fail to activate if any of its dependencies will just backtrack to this frame. I.E. for each dependency check if any of its cached `conflicting_activations` are already all activated. If so we can just skip to the next candidate. We also add that bad `conflicting_activations` to our set of `conflicting_activations`, so that we can... **The pay off:** If we fail to find any candidates that we can activate in lite of 2, then we cannot be activated in this context, add our (dep, `conflicting_activations`) to the cache so that next time our parent will not bother trying us. I hear you saying "but the error messages, what about the error messages?" So if we are at the end `!has_another` then we disable this optimization. After we mark our dep as being not activatable then we activate anyway. It won't resolve but it will have the same error message as before this PR. If we have been activated for the error messages then skip straight to the last candidate, as that is the only backtrack that will end with the user. I added a test in the vain of #4834. With the old code the time to run was `O(BRANCHING_FACTOR ^ DEPTH)` and took ~3min with DEPTH = 10; BRANCHING_FACTOR = 5; with the new code it runs almost instantly with 200 and 100.
☀️ Test successful - status-appveyor, status-travis |
@aidanhs would you be willing to redo the tests you ran in #4834 (comment) ? This PR should not change the output, and should make things faster. But I am nervous that I missed something and we will see regressions. |
@Eh2406 ok I'm trying to understand what's going on here in more depth (adding comments with my current understanding as I go along). I was wondering if you could help me out on a few points? Right now one question I'd have is why there's two locations we update Can you elaborate on why we have the duplicate logic? |
Thank you for doing an in depth review! If your comments are available for public view I can give them a look. I will happily do my best to explain anything I can. :-) IMO the best outcome of the discussion is a different way to write the code so that it makes more sense / less relies on how all the pieces fit together. That specific question is an excellent one. I have tried unsuccessfully to remove that duplication several times. I will try again when I am done with this email. For now let me tell the story from the beginning. The first attempt at this had a cache for every time So then we decided that we would activate if Solution when we know we can not be successful add ourselves to the cache. Only then do we decide whether or not to activate. This is why we have 2 places we add ourselves to the cache. If that is why we need it, then why do I think it may be removable now? We have to activate all the way to the bottom for every time we I think we should have 2 cases
|
Oh sure! I was gonna wait on #5187 to avoid conflicts but we can go ahead and get started, I opened up #5195 with some of the comments/tweaks I've got so far. Thanks for the explanation, though! I think that makes sense to me in terms of propagating the |
Faster resolver: clean code and the `backtrack_stack` This is a small extension to #5168 and is inspired by #4834 (comment) After #5168 these work (and don't on cargo from nightly.): - `safe_core = "=0.22.4"` - `safe_vault = "=0.13.2"` But these don't work (and do on cargo from this PR.) - `crust = "=0.24.0"` - `elastic = "=0.3.0"` - `elastic = "=0.4.0"` - `elastic = "=0.5.0"` - `safe_vault = "=0.14.0"` It took some work to figure out why they are not working, and make a test case. This PR remove use of `conflicting_activations` before it is extended with the conflicting from next. #5187 (comment) However the `find_candidate(` is still needed so it now gets the conflicting from next before being called. It often happens that the candidate whose child will fail leading to it's failure, will have older siblings that have already set up `backtrack_frame`s. The candidate knows that it's failure can not be saved by its siblings, but sometimes we activate the child anyway for the error messages. Unfortunately the child does not know that is uncles can't save it, so it backtracks to one of them. Leading to a combinatorial loop. The solution is to clear the `backtrack_stack` if we are activating just for the error messages. Edit original end of this message, no longer accurate. #5168 means that when we find a permanent problem we will never **activate** its parent again. In practise there afften is a lot of work and `backtrack_frame`s between the problem and reactivating its parent. This PR removes `backtrack_frame`s where its parent and the problem are present. This means that when we find a permanent problem we will never **backtrack** into it again. An alternative is to scan all cashed problems while backtracking, but this seemed more efficient.
This work is inspired by @alexcrichton's comment that a slow resolver can be caused by all versions of a dependency being yanked. Witch stuck in my brain as I did not understand why it would happen. If a dependency has no candidates then it will be the most constrained and will trigger backtracking in the next tick. Eventually I found a reproducible test case. If the bad dependency is deep in the tree of dependencies then we activate and backtrack
O(versions^depth)
times. Even tho it is fast to identify the problem that is a lot of work.The set up:
conflicting_activations
).conflicting_activations
are already all activated. If so we can just skip to the next candidate. We also add that badconflicting_activations
to our set ofconflicting_activations
, so that we can...The pay off:
If we fail to find any candidates that we can activate in lite of 2, then we cannot be activated in this context, add our (dep,
conflicting_activations
) to the cache so that next time our parent will not bother trying us.I hear you saying "but the error messages, what about the error messages?" So if we are at the end
!has_another
then we disable this optimization. After we mark our dep as being not activatable then we activate anyway. It won't resolve but it will have the same error message as before this PR. If we have been activated for the error messages then skip straight to the last candidate, as that is the only backtrack that will end with the user.I added a test in the vain of #4834. With the old code the time to run was
O(BRANCHING_FACTOR ^ DEPTH)
and took ~3min with DEPTH = 10; BRANCHING_FACTOR = 5; with the new code it runs almost instantly with 200 and 100.