Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast path for incompatible deps #6919

Closed
wants to merge 5 commits into from

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented May 8, 2019

This adds two targeted fast paths that happen to cover the cases in #6283, to the general tool made in #6776 and improved in #6910. This gets all the tests to pass. Including all of the seeds reported at #6258, even #6258 (comment) that is failing on master.

@rust-highfive
Copy link

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 8, 2019
@Eh2406
Copy link
Contributor Author

Eh2406 commented May 9, 2019

So after posting this I realized that the proptests have a size argument. If it can't find a slow case in Indexes of 50 crates etch with 20 versions then we can try Indexes of 70 crates etch with 30 versions. Indeed that found a failing case, I am working on minimizing.

@Eh2406 Eh2406 force-pushed the incompatible-deps branch from 6da8bc2 to 56d0443 Compare May 10, 2019 18:52
@Eh2406 Eh2406 marked this pull request as ready for review May 14, 2019 21:20
@Eh2406 Eh2406 force-pushed the incompatible-deps branch from 1686c75 to 9a6a1a5 Compare May 14, 2019 21:22
@Eh2406
Copy link
Contributor Author

Eh2406 commented May 14, 2019

I have given up on finding a clear test case for the original optimization, and just added it back in.
When it looks good to you I can update the OP.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so trying to understand the current purpose of the PR, it sounds like we have a general system for handling this sort of backtracking but what's added here is a more specialized version for just Cargo's problem? That feels wrong though so I'm curious if you've got a refresher you could give me :)

None
};

let our_links: Option<HashSet<_>> = our_candidates.iter().map(|c| c.summary.links()).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more clear to use filter_map and then collect directly into HashSet instead of Option<HashSet> perhaps?

con.extend(other.iter().map(|(&id, re)| (id, re.clone())));
if (our_activation_key.map_or(false, |our| {
other.summary.package_id().as_activations_key() == our
}) || our_link.map_or(false, |_| other.summary.links() == our_link))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of extracting a single element above would it be perhaps more clear to change these to:

if our_activation_keys.len() == 1 && our_activation_keys.contains(&other.summary.package_id().as_activations_key()) {
    // ..
}

or something like that?

// A + B is active then A is active and we know that is not ok.
for (_, other) in &others {
con.extend(other.iter().map(|(&id, re)| (id, re.clone())));
if (our_activation_key.map_or(false, |our| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this if have a comment indicating what it's doing? (aka why it's skipping over things if these sets are single element and we're the only element in there)

} else {
cx.is_active(id).filter(|&age|
// we only care about things that are older then critical_age
age < backtrack_critical_age)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in need of some rustfmt here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fmt on save, but it seems to be some kind of fmt bug. If the comment is in the callback fmt will not move it.

past_conflicting_activations.insert(dep, &con);
return Some(con);

continue 'dep;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a comment here indicating why we're breaking to the outer loop and cancelling this part of the search?

);
println!("used new {}", other.summary.version());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray println

past_conflicting_activations.find(
new_dep,
&|id| {
if id == candidate.package_id() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty similar to the code above, so maybe an implementation could be shared?

@Eh2406 Eh2406 force-pushed the incompatible-deps branch 3 times, most recently from ef858d8 to f372b11 Compare May 15, 2019 19:39
@Eh2406
Copy link
Contributor Author

Eh2406 commented May 15, 2019

Added a bunch of comments that I think address your points, and updated the OP.
However I need to think thru a good answer for your question about the overall picture.

@bors
Copy link
Contributor

bors commented May 15, 2019

☔ The latest upstream changes (presumably #6946) made this pull request unmergeable. Please resolve the merge conflicts.

@Eh2406 Eh2406 force-pushed the incompatible-deps branch from f372b11 to 49eaa13 Compare May 15, 2019 20:35
@Eh2406
Copy link
Contributor Author

Eh2406 commented May 16, 2019

Ok so trying to understand the current purpose of the PR, it sounds like we have a general system for handling this sort of backtracking but what's added here is a more specialized version for just Cargo's problem? That feels wrong though so I'm curious if you've got a refresher you could give me :)

The more I think about it the more that is accurate, with one small tweak. "backtracking" is technically the job of find_candidate witch is untouched by this PR, it is the job of resetting until conflicting_activations no longer applies. So the tweaked version is:

Ok so trying to understand the current purpose of the PR, it sounds like we have a general system for generating conflicting_activations but what's added here is a more specialized version for just Cargo's problem? That feels wrong though so I'm curious if you've got a refresher you could give me :)

So the job of the resolver is to select a set of crates that meat some constraints. (There should probably be a list.)

Some constraints are enforced by the structure of the project. For example the constraint "each activated versions dependencies are satisfied" is enforced by adding the list of dependencies to the RemainingDeps when each version is activated. (There should probably be links to the code that enforced each of the constraints in the list.)

Other constraints are enforced by a check before activation. For example the constraint "No two activated versions are semver compatible" is enforced by an assert in flag_activated and by a check in RemainingCandidates::next. Why the duplication? Because cloning a BacktrackFrame is more expensive than doing the work twice. (Maybe there is an architecture that can share the code without the extra clones, but for now this is how it works.)

This leads to "learned clauses" or what does this have to do with conflicting_activations? Most of the time it is not that helpful to know that a constraint is violated. a v1.0.0 is activated so you cant activate a v1.0.1, no big deal RemainingCandidates::next will loop until it finds one that works. What is helpful is combinations of constraints that are violated. Like a v1.0.0 is activated and b v1.0.0 requires a >= 1.0.1. So we can learn that a v1.0.0 and b v1.0.0 are incompatible, even though there not in direct violation of any one of the constraints individually. This hard won insite is stored in a conflicting_activations. (It is currently stored as a dep that failed and a list of package ids that were activated. Maybe we should take a page from pubgrub and make it just a list of package ids that are incompatible and stored in a way to make it easy to recover all the combinations required to get there.) So we have code that makes sure that we never enter a state that is a conflicting_activations that we have seen before.

So if we already have the infrastructure to ban a conflicting_activations and the infrastructure to merge two conflicting_activations, why do we need RemainingCandidates::next? Can't we "just" add all the base case conflicts to the store when we discover the new crate and then let the conflicting_activations machinery do its work? Yes we could, but it has two downsides.

  1. There are a lot of base cases. O(n^2) per semver range, n=number of semver compatible versions.
  2. We want to do as little work as possible in the happy path where the the first thing we try works.

Pre #6776, we only make the conflicting_activations for the direct reasons that triggered "backtracking". That means that we don't repeat work, and we use the method of exhaustion to generalize. There are many real world cases for witch that is more than sufficient. It is still something like O(i*v^c)
v = the number of versions
c = the numbers of things being conflicted
i = the number of dependents that are irrelevant but happens to get resolved ferst

But, a small amount of noting a pattern can make things exponentially better. We want to combine previously discovered conflict to make a more general one. There are a staggering number of conflicting_activations that we could make, but we only want to make ones that are likely to be useful.

The insight of #6776 was to sort the packages by how far back we can "backtrack" and only try to combine with the one that is newest. If we were able to prove that one of the others was not needed, we would still "backtrack" to the same place, so that generalization is not likely to be helpful. Then we can try to combine it with the constraint that the dependency that required that package needs to resolve to something. If we have already stored a conflicting_activations for everything that other dependency can resolve to then we can just say we conflict with that dependency. (Technically we don't have a way for a conflicting_activations to refer to the dependency itself so we just refer to its parent.)

The problem, we only store a conflict for things we have tried. We have to, as there are an almost infinite number of conflicts for things we haven't tried. One reason we may not have tried a version is that one of its dependencies is not going to work, so this PR adds a check for that.

The fuzzer found a more annoying reason. let's say we have to resolve ['a >= 1.1, <= 1.98'', 'a >= 1.2, <= 1.99', 'a = 1.0.0']
The third one will add a conflict with all versions of a that match >= 1.2, <= 1.98 but will never generalize as we will never try to resolve a = 1.0.0 with a v1.1.0 nor with a v1.99.0

There is probably some general way to combine things, but I don't see it. But there is something really simple about this case, it is really fast to see that a = 1.0.0 is not going to work with a v1.1.0 nor with a v1.99.0, even though we never technically tried it. It is one of the base cases that we didn't want to add to the store. We can say that if our dependency only covers one semver range, then things that are in that range but not compatible with our dependency, are blockers.

This hack is yet another place we enforce the semver constraint, and is a special case not a general rule, but ^ requirements are by far the most common in practice. Even when not in one of the pathological cases this should make most cases of resolution problems v times faster.

@ehuss
Copy link
Contributor

ehuss commented May 20, 2019

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned ehuss May 20, 2019
@bors
Copy link
Contributor

bors commented Jun 18, 2019

☔ The latest upstream changes (presumably #7011) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss ehuss added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 13, 2021
@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 13, 2021

This is not going to get the attention that is needed to merge. And at this point I'd rather see the effort go into PubGrub not into optimizing this resolver.

A lot of ink has gone into it, a more responsible maintainer would turn it into documentation before closing. But I am not that responsible.

@Eh2406 Eh2406 closed this Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants