New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

fast path for incompatible deps #6919

Closed

Eh2406 wants to merge 5 commits into rust-lang:master from Eh2406:incompatible-deps

Contributor

Eh2406 commented May 8, 2019 •

edited

Loading

This adds two targeted fast paths that happen to cover the cases in #6283, to the general tool made in #6776 and improved in #6910. This gets all the tests to pass. Including all of the seeds reported at #6258, even #6258 (comment) that is failing on master.

Eh2406 requested a review from alexcrichton

May 8, 2019 18:50

rust-highfive assigned ehuss

rust-highfive commented May 8, 2019

(rust_highfive has picked a reviewer for you, use r? to override)

rust-highfive added the S-waiting-on-review label

Contributor Author

Eh2406 commented May 9, 2019

So after posting this I realized that the proptests have a size argument. If it can't find a slow case in Indexes of 50 crates etch with 20 versions then we can try Indexes of 70 crates etch with 30 versions. Indeed that found a failing case, I am working on minimizing.

Eh2406 force-pushed the incompatible-deps branch from 6da8bc2 to 56d0443 Compare

May 10, 2019 18:52

Eh2406 marked this pull request as ready for review

May 14, 2019 21:20

Eh2406 force-pushed the incompatible-deps branch from 1686c75 to 9a6a1a5 Compare

May 14, 2019 21:22

Contributor Author

Eh2406 commented May 14, 2019 •

edited by alexcrichton

Loading

I have given up on finding a clear test case for the original optimization, and just added it back in.
When it looks good to you I can update the OP.

alexcrichton reviewed

View reviewed changes

Member

alexcrichton left a comment

Ok so trying to understand the current purpose of the PR, it sounds like we have a general system for handling this sort of backtracking but what's added here is a more specialized version for just Cargo's problem? That feels wrong though so I'm curious if you've got a refresher you could give me :)

src/cargo/core/resolver/mod.rs Outdated

    
                      None

                  };

                  let our_links: Option<HashSet<_>> = our_candidates.iter().map(|c| c.summary.links()).collect();

Member

alexcrichton May 15, 2019

This might be more clear to use filter_map and then collect directly into HashSet instead of Option<HashSet> perhaps?

src/cargo/core/resolver/mod.rs Outdated

    
                                  con.extend(other.iter().map(|(&id, re)| (id, re.clone())));

                              if (our_activation_key.map_or(false, |our| {

                                  other.summary.package_id().as_activations_key() == our

                              }) || our_link.map_or(false, |_| other.summary.links() == our_link))

Member

alexcrichton May 15, 2019

Instead of extracting a single element above would it be perhaps more clear to change these to:

if our_activation_keys.len() == 1 && our_activation_keys.contains(&other.summary.package_id().as_activations_key()) {
    // ..
}

or something like that?

src/cargo/core/resolver/mod.rs Outdated

    
                              // A + B is active then A is active and we know that is not ok.

                              for (_, other) in &others {

                                  con.extend(other.iter().map(|(&id, re)| (id, re.clone())));

                              if (our_activation_key.map_or(false, |our| {

Member

alexcrichton May 15, 2019

Could this if have a comment indicating what it's doing? (aka why it's skipping over things if these sets are single element and we're the only element in there)

src/cargo/core/resolver/mod.rs Outdated

    
                                      } else {

                                          cx.is_active(id).filter(|&age|

                                                      // we only care about things that are older then critical_age

                                                      age < backtrack_critical_age)

Member

alexcrichton May 15, 2019

Maybe in need of some rustfmt here?

Contributor Author

Eh2406 May 15, 2019

I have fmt on save, but it seems to be some kind of fmt bug. If the comment is in the callback fmt will not move it.

src/cargo/core/resolver/mod.rs

    
                              past_conflicting_activations.insert(dep, &con);

                              return Some(con);

                              continue 'dep;

Member

alexcrichton May 15, 2019

Could there be a comment here indicating why we're breaking to the outer loop and cancelling this part of the search?

src/cargo/core/resolver/mod.rs Outdated

    
                                  );

                                  println!("used new {}", other.summary.version());

Member

alexcrichton May 15, 2019

stray println

src/cargo/core/resolver/mod.rs Outdated

    
                                  past_conflicting_activations.find(

                                      new_dep,

                                      &|id| {

                                          if id == candidate.package_id() {

Member

alexcrichton May 15, 2019

This seems pretty similar to the code above, so maybe an implementation could be shared?

Eh2406 force-pushed the incompatible-deps branch 3 times, most recently from ef858d8 to f372b11 Compare

May 15, 2019 19:39

Contributor Author

Eh2406 commented May 15, 2019

Added a bunch of comments that I think address your points, and updated the OP.
However I need to think thru a good answer for your question about the overall picture.

Contributor

bors commented May 15, 2019

☔ The latest upstream changes (presumably #6946) made this pull request unmergeable. Please resolve the merge conflicts.

Eh2406 added 4 commits

May 15, 2019 16:35


          fast path the 2 most common backtrack paths. Test say we don't need t…

ec0ad71

…he general code!?


          fast path one more problem

c0be9ce


          add the original optimization in for good measure

8080c98


          add comments

49eaa13

Eh2406 force-pushed the incompatible-deps branch from f372b11 to 49eaa13 Compare

May 15, 2019 20:35


          skip the unneeded allocation

fb786a1

Contributor Author

Eh2406 commented May 16, 2019

Ok so trying to understand the current purpose of the PR, it sounds like we have a general system for handling this sort of backtracking but what's added here is a more specialized version for just Cargo's problem? That feels wrong though so I'm curious if you've got a refresher you could give me :)

The more I think about it the more that is accurate, with one small tweak. "backtracking" is technically the job of find_candidate witch is untouched by this PR, it is the job of resetting until conflicting_activations no longer applies. So the tweaked version is:

Ok so trying to understand the current purpose of the PR, it sounds like we have a general system for generating conflicting_activations but what's added here is a more specialized version for just Cargo's problem? That feels wrong though so I'm curious if you've got a refresher you could give me :)

So the job of the resolver is to select a set of crates that meat some constraints. (There should probably be a list.)

Some constraints are enforced by the structure of the project. For example the constraint "each activated versions dependencies are satisfied" is enforced by adding the list of dependencies to the RemainingDeps when each version is activated. (There should probably be links to the code that enforced each of the constraints in the list.)

Other constraints are enforced by a check before activation. For example the constraint "No two activated versions are semver compatible" is enforced by an assert in flag_activated and by a check in RemainingCandidates::next. Why the duplication? Because cloning a BacktrackFrame is more expensive than doing the work twice. (Maybe there is an architecture that can share the code without the extra clones, but for now this is how it works.)

This leads to "learned clauses" or what does this have to do with conflicting_activations? Most of the time it is not that helpful to know that a constraint is violated. a v1.0.0 is activated so you cant activate a v1.0.1, no big deal RemainingCandidates::next will loop until it finds one that works. What is helpful is combinations of constraints that are violated. Like a v1.0.0 is activated and b v1.0.0 requires a >= 1.0.1. So we can learn that a v1.0.0 and b v1.0.0 are incompatible, even though there not in direct violation of any one of the constraints individually. This hard won insite is stored in a conflicting_activations. (It is currently stored as a dep that failed and a list of package ids that were activated. Maybe we should take a page from pubgrub and make it just a list of package ids that are incompatible and stored in a way to make it easy to recover all the combinations required to get there.) So we have code that makes sure that we never enter a state that is a conflicting_activations that we have seen before.

So if we already have the infrastructure to ban a conflicting_activations and the infrastructure to merge two conflicting_activations, why do we need RemainingCandidates::next? Can't we "just" add all the base case conflicts to the store when we discover the new crate and then let the conflicting_activations machinery do its work? Yes we could, but it has two downsides.

There are a lot of base cases. O(n^2) per semver range, n=number of semver compatible versions.
We want to do as little work as possible in the happy path where the the first thing we try works.

Pre #6776, we only make the conflicting_activations for the direct reasons that triggered "backtracking". That means that we don't repeat work, and we use the method of exhaustion to generalize. There are many real world cases for witch that is more than sufficient. It is still something like O(i*v^c)
v = the number of versions
c = the numbers of things being conflicted
i = the number of dependents that are irrelevant but happens to get resolved ferst

But, a small amount of noting a pattern can make things exponentially better. We want to combine previously discovered conflict to make a more general one. There are a staggering number of conflicting_activations that we could make, but we only want to make ones that are likely to be useful.

The insight of #6776 was to sort the packages by how far back we can "backtrack" and only try to combine with the one that is newest. If we were able to prove that one of the others was not needed, we would still "backtrack" to the same place, so that generalization is not likely to be helpful. Then we can try to combine it with the constraint that the dependency that required that package needs to resolve to something. If we have already stored a conflicting_activations for everything that other dependency can resolve to then we can just say we conflict with that dependency. (Technically we don't have a way for a conflicting_activations to refer to the dependency itself so we just refer to its parent.)

The problem, we only store a conflict for things we have tried. We have to, as there are an almost infinite number of conflicts for things we haven't tried. One reason we may not have tried a version is that one of its dependencies is not going to work, so this PR adds a check for that.

The fuzzer found a more annoying reason. let's say we have to resolve ['a >= 1.1, <= 1.98'', 'a >= 1.2, <= 1.99', 'a = 1.0.0']
The third one will add a conflict with all versions of a that match >= 1.2, <= 1.98 but will never generalize as we will never try to resolve a = 1.0.0 with a v1.1.0 nor with a v1.99.0

There is probably some general way to combine things, but I don't see it. But there is something really simple about this case, it is really fast to see that a = 1.0.0 is not going to work with a v1.1.0 nor with a v1.99.0, even though we never technically tried it. It is one of the base cases that we didn't want to add to the store. We can say that if our dependency only covers one semver range, then things that are in that range but not compatible with our dependency, are blockers.

This hack is yet another place we enforce the semver constraint, and is a special case not a general rule, but ^ requirements are by far the most common in practice. Even when not in one of the pathological cases this should make most cases of resolution problems v times faster.

Eh2406 mentioned this pull request

Less duplication in activate #6967

Closed

Contributor

ehuss commented May 20, 2019

r? @alexcrichton

rust-highfive assigned alexcrichton and unassigned ehuss

Contributor

bors commented Jun 18, 2019

☔ The latest upstream changes (presumably #7011) made this pull request unmergeable. Please resolve the merge conflicts.

ehuss added S-waiting-on-author and removed S-waiting-on-review labels

Contributor Author

Eh2406 commented Jan 13, 2021

This is not going to get the attention that is needed to merge. And at this point I'd rather see the effort go into PubGrub not into optimizing this resolver.

A lot of ink has gone into it, a more responsible maintainer would turn it into documentation before closing. But I am not that responsible.

Eh2406 closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author