Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make resolution backtracking smarter #4834

Merged
merged 4 commits into from
Feb 6, 2018

Conversation

aidanhs
Copy link
Member

@aidanhs aidanhs commented Dec 18, 2017

There's no need to try every candidate for every dependency when backtracking - instead, only try candidates if they could change the eventual failure that caused backtracking in the first place, i.e.

  1. if we've backtracked past the parent of the dep that failed
  2. the number of activations for the dep has changed (activations are only ever added during resolution I believe)

The two new tests before:

$ /usr/bin/time cargo test --test resolve -- --test-threads 1 --nocapture resolving_with_constrained_sibling_
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/resolve-19b2a13e5a19eed8
38.45user 2.16system 0:42.00elapsed 96%CPU (0avgtext+0avgdata 47672maxresident)k
0inputs+1664096outputs (0major+10921minor)pagefaults 0swaps

After:

$ /usr/bin/time cargo test --test resolve -- --test-threads 1 --nocapture resolving_with_constrained_sibling_
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/resolve-19b2a13e5a19eed8
[...]
0.36user 0.01system 0:00.49elapsed 76%CPU (0avgtext+0avgdata 47464maxresident)k
0inputs+32outputs (0major+11602minor)pagefaults 0swaps

You observe the issue yourself with the following (it should fail, but hangs for a while instead - I didn't bother timing it and waiting for it to finish. With this PR it terminates almost instantly):

$ cargo new --bin x
     Created binary (application) `x` project
$ /bin/echo -e 'serde = "=1.0.9"\nrust-s3 = "0.8"' >> x/Cargo.toml
$ cd x && cargo build
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Resolving dependency graph...

@rust-highfive
Copy link

r? @matklad

(rust_highfive has picked a reviewer for you, use r? to override)

@aidanhs
Copy link
Member Author

aidanhs commented Dec 18, 2017

I'm not sure that my two new tests should actually be in tests, since in the context of this PR they exist as benchmarks. Though they do verify correct behavior I guess.

@alexcrichton
Copy link
Member

Thanks so much for the PR! (and great seeing work in this space!)

This is historically a super tricky part of Cargo so I'm currently still trying to wrap my head around the changes here. To make sure I understand this I'm going to say it back to you and you can tell me where I'm totally wrong :)

Right now when cargo activates a package A to be part of the dependency graph, it then also queues up a bunch of pending candidates for the dependencies of A into remaining_deps. Let's say A depends on B.

Additionally when Cargo activates a dependency that has a bunch of candidates, it'll push candidates onto the "backtrack stack" in case the current candidate fails. Candidates are always filtered to be "not semver compat with other activated versions of the crate".

This means that if B fails to activate for whatever reason, we need to do some backtracking. This... though is where I get lost in how this change works. My naive reading of this is that when B fails to get selected then if A (the parent I think?) is either not activated or a different number of versions of B are activated then backtracking could be successful. If, however, A is activated and B's number of activated versions remains the same we assume that the resolution failure will always remain the same.

That last part though isn't quite clicking with me, I'm not sure how A (maybe A isn't the parent?) would affect the resolution of B, and just comparing the lengths seems like it could go awry, right? (as in, different sets of different versions maybe?)

Perhaps though you can help me clear up these thoughts! I imagine it's all much more fresh in your mind :)

@aidanhs
Copy link
Member Author

aidanhs commented Dec 21, 2017

Briefly: your understanding is correct and there is a justification for the two points you raise. I wrote a long comment explaining it all, then in the process realised there's an additional factor I overlooked that means this PR is not correct as-is :(

I've saved the comment and will post it here when I either update this PR or give up on it. Either way, I'll make sure I submit the tests at some point since they're interesting edge cases.

@alexcrichton
Copy link
Member

Heh ok sounds good to me!

@Eh2406
Copy link
Contributor

Eh2406 commented Jan 19, 2018

@aidanhs Can you elaborate on where this is at? I was just thinking of trying this optimization, for my own edification. So I would love to know what additional factor I had not thout of. And I'd Love to help push this to be done!

There's no need to try every candidate for every dependency - instead,
only try alternative candidates if they *could* change the eventual
failure that caused backtracking in the first place.
@aidanhs
Copy link
Member Author

aidanhs commented Feb 3, 2018

I'm back with good news - I think this PR is actually fine! I am indending to run it against the whole of crates.io though, I'll report back when that's done.

My naive reading of this is that when B fails to get selected then if A (the parent I think?) is either not activated or a different number of versions of B are activated then backtracking could be successful. If, however, A is activated and B's number of activated versions remains the same we assume that the resolution failure will always remain the same.

Exactly. If you think the comment above the function could be improved to make this clearer, please let me know (I pushed a tweak to make the English a bit better).

That last part though isn't quite clicking with me, I'm not sure how A (maybe A isn't the parent?) would affect the resolution of B, and just comparing the lengths seems like it could go awry, right? (as in, different sets of different versions maybe?)

It is a bit subtle, so let's see if I can convince you (and if I can't, maybe it's wrong)!

Let's say we have a subgraph with A depending on both B and C. A and B were activated, but C failed and we're now backtracking. So, what could change this failure situation?

  1. Backtracking B will do nothing - backtracking a 'standalone' sibling (one with no dependencies) will never help.
  2. Backtracking past A to a point where we're considering an alternative version of A (A') could unstick us, because A' may have a completely different set of dependencies (that may not even include C) - we should try again.
  3. Backtracking past A to a dependee of A could unstick us for the same reason as A' could unstick - a completely different set of dependencies and we should try again.
  4. Backtracking past A to a candidate for some arbitrary dep X is possible (since resolution order is 'most constrained first' rather than on hierarchy). This new candidate for X may or may not unstick C and it's tricky to figure out, so we just pessimistically try again.

But let's say we're backtracking and haven't yet gone past A (or 2, 3, 4 would have us trying again by now). So, what other situation could unstick us? The requirements of A won't change before backtracking past it, so we need the version requirements to relax elsewhere to possibly permit one of A's candidates for C.

The only thing we take into account when finding remaining candidates is the list of previous activations for a dep:

impl RemainingCandidates {
fn next(&mut self, prev_active: &[Summary]) -> Option<Candidate> {
// Filter the set of candidates based on the previously activated
// versions for this dependency. We can actually use a version if it

So:

  1. If the list of previous activations for C is 'different' to what it was at the point of the failed activation, try again as things may have relaxed. Note that at the point of entry into the backtracking function, any one state in the backtrack stack represents a series of successful resolutions, with each subsequent state representing progressively longer series of successful resolutions (unsuccessful resolutions are removed from the stack by the time backtracking finds a new candidate and returns). As the number of successful resolutions increases, so too does the number of activations (there is no way to cause 'deactivation', or mutate an activation) and therefore the lists of 'previous activations' grows monotonically through the backtrack stack - checking for a change in length is sufficient to test 'different'. I've added an assertion to make clearer that the length should always monotonically decrease as we pop states off the stack.

And that's it! Either the parent is backtracked or activations have changed - the only two ways dependency requirements may change and make a failed activation succeed.


Some additional thoughts:

Note that (from my reading) dependency resolution is NP-complete so there are no easy solutions. This link mentions heuristics that aptitude uses (we use the heuristic of choosing most constrained packages first), but it's worth being wary - heuristics tend to just shift the worst cases to be less common.

I believe this PR is an actual optimisation, but I'm wary after my previous uncertainty, hence my running against crates.io. One additional route to consider is faster data structures, but given that this improves the terrible case I originally saw, I'm happy.


Here are my notes on what I thought was wrong:

Imagine a new subgraph:

    A
  / | \
 B  C  D
 |  |
 C  D

Let's say we resolved from left to right, up to the D (under A) at which point it failed. We backtrack, trying the candidates of D (under C), but still no joy - the selected versions of A and C (under A) have irreconcilable requirements for the version of D. However, alternatives to C (under A) may have different requirements for D that can reconcile with what A wants. These effects are transitive, as backtracking all the way back to B could affect choices of C, which then affects D.

(it turns out that this is fine - as soon as D (under C) is backtracked, activations for D change and we instantly try the next possible candidate. I added a test for this scenario)

@Eh2406
Copy link
Contributor

Eh2406 commented Feb 4, 2018

Thank you! This is a very clear explanation, and concurs with the way I read the code.

Another explanation, based on conflict-driven SAT concepts:
We have found a contradiction. Namely there is no way to activate this requirement in this context.
We can backjump until the contradiction go away. How could the contradiction goes away? Well if either of its parts go away. So we have backtrack until either the requirement has changed or the context changed.

  • How could the requirement change? At a minimum, it can't if the our parent is still on the same version.
  • How could the context change? Well, anything could change the context, but the only context that we look at (for now) is the previously activated versions of this crate. We only activate one thing at a time. So the context only changes if the number of activated versions of this crate have changed.

I.E. We can backjump until our parent is no longer active or until number of activated versions of this crate have changed.

If you are running against all of crates.io then can I ask:

  • which crates take the longest to resolve before?
  • which crates take the longest after this change?
  • which crates did it make the most difference for?

Also, I think this would be a good place for property based testing. Bild random graphs and see if they resolve the same. Just a someday thought.

@alexcrichton
Copy link
Member

Ok thanks so much for the explanations @aidanhs and @Eh2406, I'm now thoroughly convinced :)

@aidanhs whenever you're ready to land this just let me know, although I'd be ready to land it whenever, I'd be ok not needing all of crates.io. @Eh2406's explanation is also what really made this click for me, so if you want to link that comment from the code, that'd also be great!

One additional route to consider is faster data structures, but given that this improves the terrible case I originally saw, I'm happy.

Certainly! I'm sort of wary of this helping much though as most of the time the bad cases I've seen are moreso because of O(??) behavior rather than "if I make this 2x faster it's reasonable". In that sense I see switching data structures as not changing the O(??) runtime but rather the constant factors in play. That being said right now cloning the HashMap is linear wrt its size, so I could see how switching to better data structures would reduce the O still too :)

Also, I think this would be a good place for property based testing. Bild random graphs and see if they resolve the same. Just a someday thought.

Definitely! I feel like we've never really had great testing of the resolver...

@aidanhs
Copy link
Member Author

aidanhs commented Feb 6, 2018

I've added the reference to @Eh2406's comment and I think this is ready to go.

I did do a run against the whole of crates.io in the end, sha1ing the generated Cargo.locks. It only took about 4 hours (thanks to @Eh2406's -Zno-index-update flag!). Here's the diff:

$ diff -U1 <(cat cargo-orig.log | awk '{print $1, $5}') <(cat cargo-new.log | awk '{print $1, $5}')
--- /dev/fd/63  2018-02-06 02:20:06.510547944 +0000
+++ /dev/fd/62  2018-02-06 02:20:06.510547944 +0000
@@ -2325,4 +2325,4 @@
 lux-0.0.1 419ab6a9d2321c1f06845f39465a0a0bd514e69a
-lux-0.0.2 !-15
-lux-0.0.3 !-15
+lux-0.0.2 05982881c44eafd34a020de449b9f8cc90d11d66
+lux-0.0.3 537d74b6b6eda28b7386a40c152a45d0ef22e5fb
 lux-0.0.4 982e3131cd256320cb0549f2b60469d111a6691e
@@ -34862,3 +34862,3 @@
 kinto_http-0.1.0 594a9550828a6ad5f72b4e66ae0d505eb6944a87
-kiss3d-0.1.0 016f421710765909c223af32284280e61cb2a049
+kiss3d-0.1.0 60ac678b08c13ac16a0c1e21bb670b39fd7839ae
 kiss3d-0.1.1 dc9ef8721c764a61c9e932436bbe454bb48b29a4
@@ -41643,5 +41643,5 @@
 mussh-0.1.0 47e0339a47990ed08f5f3aee4bfc1a643a2a679d
-mussh-0.1.1 !-15
+mussh-0.1.1 5ce29136e421240657c8d46a8518057b120acb97
 mussh-0.1.2 85a18e85b9c8eccc70b068cdcfb8f775e19beaa4
-mussh-1.0.0 !-15
+mussh-1.0.0 89a6af35ca2e446f5c8da58204d738eae1cc605b
 mussh-1.0.1 8f9591c7ba37264b52e3de656f8fdc95fef164ee
@@ -61088,3 +61088,3 @@
 slog-perf-0.2.0 17316d2c4fda54d49074b39eb84830eba5bdddca
-slog-retry-0.1.0 !-15
+slog-retry-0.1.0 e0a169f37938a6128376771f9724a5dd0ced77ae
 slog-scope-stdlog-0.2.0 164971fad1b6ec761cbe462aa58e6b8f8afacfd5

!X means the program exited with that (nonzero) code. A negative code means it was terminated due to that signal - I killed things that hadn't generated a lockfile after 60s had passed.

So that's 5 crates that previously wouldn't resolve within a minute, now resolving almost instantly. Of particular note is that this affects the one and only version of slog-retry! @vorner, out of curiosity had you noticed this slow generation of lockfiles for slog-retry?

I don't know what's up with kiss3d-0.1.0 and I can't reproduce the inconsistency now. Maybe I accidentally updated the index during the run by using cargo elsewhere?

I also analysed my results as requested by @Eh2406, based on the user cpu time consumed by cargo. I caution you to verify these yourself - the machine was otherwise idle while doing this, but I did not do repeated runs.

Longest originally (excluding timeout)
safe_core-0.22.3 took 22.18s
safe_network_common-0.4.0 took 20.06s
safe_network_common-0.4.1 took 19.94s
safe_vault-0.13.1 took 7.25s
safe_core-0.16.2 took 6.66s
mussh-0.1.0 took 3.99s
tokio-jsonrpc-0.6.0 took 1.93s
tokio-jsonrpc-0.7.0 took 1.92s
tokio-jsonrpc-0.7.1 took 1.91s
tokio-jsonrpc-0.7.2 took 1.9s

Longest now (excluding timeout)
safe_core-0.22.3 took 22.04s
safe_network_common-0.4.1 took 20.06s
safe_network_common-0.4.0 took 19.95s
safe_vault-0.13.1 took 7.19s
safe_core-0.16.2 took 6.71s
safe_core-0.16.0 took 0.55s
sccache-0.2.2 took 0.1s
scaleless_music-0.0.7 took 0.1s
safe_app-0.2.1 took 0.1s
rusoto_route53domains-0.29.0 took 0.1s

Biggest absolute difference
mussh-0.1.0 changed old->new by -3.90s
tokio-jsonrpc-0.7.0 changed old->new by -1.91s
tokio-jsonrpc-0.6.0 changed old->new by -1.89s
tokio-jsonrpc-0.7.1 changed old->new by -1.88s
tokio-jsonrpc-0.7.2 changed old->new by -1.88s
tokio-jsonrpc-0.5.1 changed old->new by -1.55s
elastic_responses-0.2.0 changed old->new by -1.13s
elastic_responses-0.2.1 changed old->new by -1.12s
elastic_responses-0.3.0 changed old->new by -1.11s
elastic_responses-0.4.0 changed old->new by -1.10s

In this collapsible section I've documented the collection/analysis process along with my scripts.

Collection and analysis process
  • crates were collected with cargo-local-serve
  • crates were extracted with find crate-archives/ -name '*.crate' -exec 'sh' '-c' 'mkdir -p out/$0 && tar -C out/$0 -xf $0 > out/$0' '{}' ';', i.e. an out directory is created with directories for foo-1.0.0.crate, and then underneath that is a foo-1.0.0 directory containing the source. If you tweak the command so yours ends up different, you need to change the benchmarking script. You'll get a bunch of warnings about files from the future and corrupted archives - I didn't bother investigating.
  • I ran cargo search serde to update my crates.io index
  • I built two versions of cargo. It should be clear how to edit the benchmarking script to point to whatever versions you want to test
  • I ran the benchmarking script like PYTHONUNBUFFERED=1 ./stamp ./bench_locking.py | tee bench.log && PYTHONUNBUFFERED=1 ./stamp ./bench_locking.py | tee bench.log (stamp is https://github.com/alexcrichton/stamp-rs). You need to run one toolchain at least once the first time to make sure all the git repos are updated to avoid unfairness for the first toolchain to run.
  • analyse was used to compare the time taken

The scripts I used are below:

bench_locking.py

#!/usr/bin/env python3
import glob
import os
import pprint
import signal
import subprocess

def go():
    cwd = os.getcwd()
    cratepaths = glob.glob('crate-archives/**/*.crate', recursive=True)
    cratepaths = [os.path.join(cwd, 'out', crate) for crate in cratepaths]
    cratepaths.sort()
    cargos = [
        ('cargo-new', os.path.join(cwd, '../cargo/target/release/cargo')),
        ('cargo-orig', os.path.join(cwd, '../cargo-orig/target/release/cargo'))
    ]
    for tag, cargo in cargos:
        print('\n')
        print('handling cargo: ' + cargo)
        numcrates = len(cratepaths)
        with open(tag + '.log', 'wb') as logfile:
            for i, cratepath in enumerate(cratepaths):
                if i % 100 == 0:
                    print('Considering crate {}/{}'.format(i, numcrates))
                timecrate(logfile, cargo, cratepath)

TIMEPATH = 'tmptime'
def timecrate(logfile, cargo, cratepath):
    cratename, ext = os.path.splitext(os.path.basename(cratepath))
    cargotoml = os.path.join(cratepath, cratename, 'Cargo.toml')
    cargolock = os.path.join(cratepath, cratename, 'Cargo.lock')
    assert ext == '.crate'
    if os.path.exists(cargolock):
        os.remove(cargolock)
    # really this should use psutil so we can send TERM signals to cargo directly -
    # I've seen strange issues with crust-0.24.0 staying running, which is very odd
    cmd = ['/usr/bin/time', '--quiet', '--format=%e %U %S', '--output', TIMEPATH, cargo, 'generate-lockfile', '--manifest-path', cargotoml, '-Zno-index-update']
    process = subprocess.Popen(cmd, preexec_fn=os.setsid)
    process_timeout = 60
    try:
        process.wait(timeout=process_timeout)
        time = open(TIMEPATH, 'rb').read().strip()
    except subprocess.TimeoutExpired:
        time = b'%a.0 !? !?' % (process_timeout,)
        os.killpg(os.getpgid(process.pid), signal.SIGTERM)
        process.wait(5)
    assert process.returncode is not None
    if process.returncode == 0:
        deets = subprocess.check_output(['sha1sum', cargolock]).split()[0]
    else:
        deets = b'!%a' % (process.returncode)
    line = b'%b %b %b' % (cratename.encode('utf-8'), time, deets)
    print('>>> ' + line.decode('utf-8'))
    logfile.write(line + b'\n')

go()

analyse.py

l1s = open('cargo-orig.log').readlines()
l2s = open('cargo-new.log').readlines()
befores = []
afters = []
compares = []
for l1, l2 in zip(l1s, l2s):
    name1, walltime1, usertime1, systime1, sha1 = l1.split(' ')
    name2, walltime2, usertime2, systime2, sha2 = l2.split(' ')
    assert name1 == name2
    name = name1
    if sha1 != sha2:
        continue
    if sha1.startswith('!'):
        continue
    usertime1 = float(usertime1)
    usertime2 = float(usertime2)
    befores.append((usertime1, name))
    afters.append((usertime2, name))
    compares.append((usertime2-usertime1, name))

befores.sort(reverse=True)
afters.sort(reverse=True)
compares.sort(reverse=True, key=lambda co: abs(co[0]))

topresults = 10
print('Longest originally (excluding timeout)')
for time, name in befores[:topresults]:
    print('{} took {}s'.format(name, time))
print()
print('Longest now (excluding timeout)')
for time, name in afters[:topresults]:
    print('{} took {}s'.format(name, time))
print()
print('Biggest absolute difference')
for delta, name in compares[:topresults]:
    print('{} changed old->new by {:.2f}s'.format(name, delta))

@Eh2406
Copy link
Contributor

Eh2406 commented Feb 6, 2018

I am glad this looks good! The list of things that timed out may be really interesting for finding more performance wins.

@alexcrichton
Copy link
Member

@bors: r+

Sweet thanks so much for the detailed analysis @aidanhs! Can't wait to get this in :)

@bors
Copy link
Contributor

bors commented Feb 6, 2018

📌 Commit a85f9b6 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Feb 6, 2018

⌛ Testing commit a85f9b6 with merge b7c2cfb...

bors added a commit that referenced this pull request Feb 6, 2018
Make resolution backtracking smarter

There's no need to try every candidate for every dependency when backtracking - instead, only try candidates if they *could* change the eventual failure that caused backtracking in the first place, i.e.
1. if we've backtracked past the parent of the dep that failed
2. the number of activations for the dep has changed (activations are only ever added during resolution I believe)

The two new tests before:
```
$ /usr/bin/time cargo test --test resolve -- --test-threads 1 --nocapture resolving_with_constrained_sibling_
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/resolve-19b2a13e5a19eed8
38.45user 2.16system 0:42.00elapsed 96%CPU (0avgtext+0avgdata 47672maxresident)k
0inputs+1664096outputs (0major+10921minor)pagefaults 0swaps
```
After:
```
$ /usr/bin/time cargo test --test resolve -- --test-threads 1 --nocapture resolving_with_constrained_sibling_
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/resolve-19b2a13e5a19eed8
[...]
0.36user 0.01system 0:00.49elapsed 76%CPU (0avgtext+0avgdata 47464maxresident)k
0inputs+32outputs (0major+11602minor)pagefaults 0swaps
```

You observe the issue yourself with the following (it should fail, but hangs for a while instead - I didn't bother timing it and waiting for it to finish. With this PR it terminates almost instantly):
```
$ cargo new --bin x
     Created binary (application) `x` project
$ /bin/echo -e 'serde = "=1.0.9"\nrust-s3 = "0.8"' >> x/Cargo.toml
$ cd x && cargo build
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Resolving dependency graph...
```
@bors
Copy link
Contributor

bors commented Feb 6, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing b7c2cfb to master...

@bors bors merged commit a85f9b6 into rust-lang:master Feb 6, 2018
@aidanhs
Copy link
Member Author

aidanhs commented Feb 6, 2018

@Eh2406 these are the crates that appeared to time out:

$ grep '!-' cargo-new.log 
crust-0.24.0 60.0 !? !? !-15
elastic-0.3.0 60.0 !? !? !-15
elastic-0.4.0 60.0 !? !? !-15
elastic-0.5.0 60.0 !? !? !-15
gfx_text-0.7.0 60.0 !? !? !-15
gfx_text-0.8.0 60.0 !? !? !-15
safe_core-0.22.4 60.0 !? !? !-15
safe_vault-0.13.2 60.0 !? !? !-15
safe_vault-0.14.0 60.0 !? !? !-15
tk-http-0.1.0 60.0 !? !? !-15
tk-http-0.1.1 60.0 !? !? !-15
tk-http-0.1.2 60.0 !? !? !-15
tk-http-0.2.0 60.0 !? !? !-15
tk-http-0.2.1 60.0 !? !? !-15
twitter-stream-0.3.0 60.0 !? !? !-15

@aidanhs aidanhs deleted the aphs-better-backtrack branch February 6, 2018 18:28
@Eh2406
Copy link
Contributor

Eh2406 commented Feb 6, 2018

Thank you! I will try to use these as a real world benchmark, along with Servo.

@Eh2406 Eh2406 mentioned this pull request Feb 13, 2018
bors added a commit that referenced this pull request Feb 14, 2018
Conflict tracking

This is an alternative implementation of #4834. This is slower but hopefully more flexible and clearer. The idea is to keep a list of `PackageId`'s that have caused us to skip a `Candidate`. Then we can use the list when we are backtracking if any items in our list have not been activated then we will have new `Candidate`'s to try so we should stop backtracking. Or to say that another way; We can continue backtracking as long as all the items in our list is still activated.

Next this new framework was used to make the error messages more focused. We only need to list the versions that conflict, as opposed to all previously activated versions.

Why is this more flexible?
1. It is not limited to conflicts within the same package. If `RemainingCandidates.next` skips something  because of a links attribute, that is easy to record, just add the `PackageId` to the set `conflicting_prev_active`.
2. Arbitrary things can add conflicts to the backtracking. If we fail to activate because some dependency needs a feature that does not exist, that is easy to record, just add the `PackageId` to the set `conflicting_activations`.
3. All things that could cause use to fail will be in the error messages, as the error messages loop over the set.
4. With a simple extension, replacing the `HashSet` with a `HashMap<_, Reason>`, we can customize the error messages to show the nature of the conflict.

@alexcrichton, @aidanhs, Does the logic look right? Does this seem clearer to you?
bors added a commit that referenced this pull request Mar 14, 2018
Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly.

This work is inspired by @alexcrichton's [comment](#4066 (comment)) that a slow resolver can be caused by all versions of a dependency being yanked. Witch stuck in my brain as I did not understand why it would happen. If a dependency has no candidates then it will be the most constrained and will trigger backtracking in the next tick. Eventually I found a reproducible test case. If the bad dependency is deep in the tree of dependencies then we activate and backtrack `O(versions^depth)`  times. Even tho it is fast to identify the problem that is a lot of work.

**The set up:**
1. Every time we backtrack cache the (dep, `conflicting_activations`).
2. Build on the work in #5000, Fail to activate if any of its dependencies will just backtrack to this frame. I.E. for each dependency check if any of its cached `conflicting_activations` are already all activated. If so we can just skip to the next candidate. We also add that bad `conflicting_activations` to our set of  `conflicting_activations`, so that we can...

**The pay off:**
 If we fail to find any candidates that we can activate in lite of 2, then we cannot be activated in this context, add our (dep, `conflicting_activations`) to the cache so that next time our parent will not bother trying us.

I hear you saying "but the error messages, what about the error messages?" So if we are at the end `!has_another` then we disable this optimization. After we mark our dep as being not activatable then we activate anyway. It won't resolve but it will have the same error message as before this PR. If we have been activated for the error messages then skip straight to the last candidate, as that is the only backtrack that will end with the user.

I added a test in the vain of #4834. With the old code the time to run was `O(BRANCHING_FACTOR ^ DEPTH)` and took ~3min with DEPTH = 10; BRANCHING_FACTOR = 5; with the new code it runs almost instantly with 200 and 100.
bors added a commit that referenced this pull request Mar 17, 2018
Faster resolver: clean code and the `backtrack_stack`

This is a small extension to #5168 and is inspired by #4834 (comment)

After #5168 these work (and don't on cargo from nightly.):
- `safe_core = "=0.22.4"`
- `safe_vault = "=0.13.2"`

But these don't work (and do on cargo from this PR.)
- `crust = "=0.24.0"`
- `elastic = "=0.3.0"`
- `elastic = "=0.4.0"`
- `elastic = "=0.5.0"`
- `safe_vault = "=0.14.0"`

It took some work to figure out why they are not working, and make a test case.

This PR remove use of `conflicting_activations` before it is extended with the conflicting from next.
#5187 (comment)
However the `find_candidate(` is still needed so it now gets the conflicting from next before being called.

It often happens that the candidate whose child will fail leading to it's failure, will have older siblings that have already set up `backtrack_frame`s. The candidate knows that it's failure can not be saved by its siblings, but sometimes we activate the child anyway for the error messages. Unfortunately the child does not know that is uncles can't save it, so it backtracks to one of them. Leading to a combinatorial loop.

The solution is to clear the `backtrack_stack` if we are activating just for the error messages.

Edit original end of this message, no longer accurate.
#5168 means that when we find a permanent problem we will never **activate** its parent again. In practise there afften is a lot of work and `backtrack_frame`s between the problem and reactivating its parent. This PR removes `backtrack_frame`s where its parent and the problem are present. This means that when we find a permanent problem we will never **backtrack** into it again.

An alternative is to scan all cashed problems while backtracking, but this seemed more efficient.
bors added a commit that referenced this pull request Oct 1, 2019
Public dependency refactor and re-allow backjumping

There were **three** attempts at vanquishing exponential time spent in Public dependency resolution. All failures. All three started with some refactoring that seams worth saving. Specifically the data structure `public_dependency` that is used to test for Public dependency conflicts is large, tricky, and modified in line. So lets make it a type with a name and move the interactions into methods.

Next each attempt needed to know how far back to jump to undo any given dependency edge. I am fairly confident that any full solution will need this functionality. I also think any solution will need a way to represent richer conflicts than the existing "is this pid active". So let's keep the `still_applies` structure from the last attempt.

Last each attempt needs to pick a way to represent a Public dependency conflict. The last attempt used three facts about a situation.

- `a1`: `PublicDependency(p)` witch can be read as the package `p` can see the package `a1`
- `b`: `PublicDependency(p)` witch can be read as the package `p` can see the package `b`
- `a2`: `PubliclyExports(b)` witch can be read as the package `b` has the package `a2` in its publick interface.

This representation is good enough to allow for `backjumping`. I.E. `find_candidate` can go back several frames until the `age` when the Public dependency conflict was introduced. This optimization, added for normal dependencies in #4834, saves the most time in practice. So having it for Public dependency conflicts is important for allowing real world experimentation of the Public dependencies feature.  We will have to alter/improve/replace this representation to unlock all of the important optimizations. But I don't know one that will work for all of them and this is a major step forward.

Can be read one commit at a time.
@ehuss ehuss added this to the 1.25.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants