BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

Eh2406 · 2018-09-06T21:15:01Z

As mentioned in #5921 (comment), the new proptest found a live bug! This PR so far tracs my attempt to minimize the problematic input.

The problem turned out to be that we where backjumping on incomplete set of conflicts.

rust-highfive · 2018-09-06T21:15:06Z

r? @matklad

(rust_highfive has picked a reviewer for you, use r? to override)

Eh2406 · 2018-09-07T14:48:49Z

Edit: I was wrong about what can fix the problem.

I came across @alexcrichton's prescient comment #5187 (comment). This is exactly the kind of bug he was worried about. Furthermore, it was found with the randomized/quickcheck testing I suggested would make me feel better.

Eh2406 · 2018-09-07T20:58:57Z

commenting out

cargo/src/cargo/core/resolver/mod.rs

Lines 833 to 838 in 2fb77a4

 if frame 

 .context_backup 

 .is_conflicting(Some(parent.package_id()), conflicting_activations) 

 { 

 continue; 

 }

will also get the test to pass.

This is a brain dump of all the things I can think of that are relevant, as I don't know how they fit together.

This is code that prunes the remaining_candidates stack to only try candidates that can solve the problem that triggered backtracking. (originally added in #4834, rewritten in #5037. maybe confirm by testing if this bug reproduces all the way back then.) So it must be skipping over something that in fact could solve the problem.

I tried running the test with trace logging and I got (edited to the relevant parts):

after yank
resolver: initial activation: root v1.0.0
resolver: root[0]>g trying 1.0.0
resolver: g[0]>b trying 1.0.0
resolver: g[1]>e trying 1.1.0
resolver: e[0]>c trying 1.1.0
resolver: b[0]>a trying 1.1.0
resolver: g[2]>f trying 1.1.0
resolver: f[0]>d trying 1.0.0
resolver: d[0]>c -- no candidates
resolver::conflict_cache: c adding a skip {PackageId { name: "c", version: "1.1.0"}: Semver}
resolver: g[2]>f trying 1.0.0
resolver: f[0]>to_yank -- no candidates
resolver::conflict_cache: to_yank adding a skip {}
-- this is ware the log ends with the line uncommented
resolver: b[0]>a trying 1.0.0
resolver: g[2]>f trying 1.1.0
resolver: f[0]>d trying 1.0.0
resolver: f[0]>d skipping 1.0.0
resolver: f[0]>d -- no candidates
resolver::conflict_cache: d adding a skip {PackageId { name: "c", version: "1.1.0"}: Semver}
resolver: g[2]>f trying 1.0.0
resolver: g[2]>f skipping 1.0.0
resolver: g[2]>f -- no candidates
resolver: g[1]>e trying 1.0.0
resolver: g[2]>f trying 1.1.0
resolver: f[0]>d trying 1.0.0
resolver: d[0]>c trying 1.0.0
resolver: b[0]>a trying 1.1.0

So this is how I read that:

e trying 1.1.0 leads to c trying 1.1.0
f trying 1.1.0 leads to d trying 1.0.0 leads to dep_req("c", "=1.0") does not work.
backtrack to f trying 1.0.0 leads to dep("to_yank") does not work. With a conflicting_activations == {} i.e. there is nothing that can make dep("to_yank") work.
So now we start pruning the remaining_candidates stack for something that will make us work or removes our parent. What will make f 1.0.0 work? Nothing. Our parent is g so we skip all the way back. Skipping right past the thing that would make a nephew (descendant of a sibling) work.

Why don't we keep track of more of the conflicting_activations like info from our siblings and their descendants? One, it is often faster to prove something is wrong by trying it then by rigorous argument" so we tend to YOLO and only get smarter on subsequent attempts (see the skipping later on in the log for example of this working). Two, we aggressively do not add things to the remaining_candidates meening that when backtracking we dont have a good list of what we are backing out.

Why don't we just disable this optimization for now? Well several of the tests hang if we leave it commented out.

Eh2406 · 2018-09-12T18:19:35Z

I found an odd hack that gets this one to pass. I am going back to the proptest branch to find a new case to shrink. Mabey looking at another example will help me to see a solution.

Eh2406 · 2018-09-14T20:07:20Z

I shrank 2 more examples. I think the dont_yet_know_the_problem_3, may be small eanough to be worth digging into. I will enable logging and see what it is doing.

…s complete

Eh2406 · 2018-09-14T21:43:31Z

Expanding the odd hack got all 3 to pass! I think this is worth merging. Thow the next thing on my todo list is merging this back into my proptest branch and seeing if it finds more bugs.

Eh2406 · 2018-09-15T16:03:00Z

The merged version of the proptest branch ran smoothly, and the performance related resolver test are not strongly affected. So I think this is ready for review.

alexcrichton · 2018-09-17T16:34:17Z

@bors: r+

Looks great to me, thanks so much for tracking this down!

bors · 2018-09-17T16:34:18Z

📌 Commit 9b2295d has been approved by alexcrichton

bors · 2018-09-17T16:46:24Z

⌛ Testing commit 9b2295d with merge f2b861dd23abb6ea7f25e7cd46ac4a508bbeec38...

bors · 2018-09-17T17:16:44Z

💔 Test failed - status-appveyor

Eh2406 · 2018-09-17T17:20:42Z

termcolor is not building with minimal-versions why are we seeing this hear? I did not touch the "toml"?

ehuss · 2018-09-17T19:05:26Z

I've been trying to figure out why things changed, but I don't see a good reason. The differences are:

cargo
    home 0.3.2 downgraded to 0.3.0

home:
    scopeguard 0.3.1 -> 0.1.2
    winapi 0.3.4 -> 0.2.8
        +kernel32-sys 0.2.2
        +advapi32-sys 0.2.0

termcolor:
    wincolor: 0.1.5 -> 0.1.1

wincolor 0.1.1:
    winapi 0.3.4 -> 0.2.8
        +kernel32-sys 0.2.2

+advapi32-sys:
    +winapi 0.2.8
    +winapi-build 0.1.1

atty:
    winapi 0.2.7 -> 0.2.8

backtrace:
    winapi 0.2.7 -> 0.2.8

curl:
    winapi 0.2.7 -> 0.2.8

dbghelp-sys:
    winapi 0.2.7 -> 0.2.8

fs2:
    winapi 0.2.7 -> 0.2.8

kernel32-sys
    winapi 0.2.7 -> 0.2.8

same-file
    winapi 0.2.7 -> 0.2.8

userenv-sys
    winapi 0.2.7 -> 0.2.8

The only thing that's changed recently is that termcolor released a 1.0.4 version 4 days ago, but tests were passing yesterday, and a newer version shouldn't affect things. I don't see why the home package suddenly started using an older version (or why it wasn't using that older version in the first place).

Eh2406 · 2018-09-17T19:13:52Z

We sort dependencies from most constrained to least. So it is possible that the number of unused versions of a crate can change the resolution.

ehuss · 2018-09-17T20:07:03Z

I bisected the index and the publishing of curl 0.4.16 is what has made the resolve change. I imagine you're right, that it shifted some of the heuristics.

Well, it seems clear that termcolor 0.3.0's dependency on wincolor 0.1.1 doesn't satisfy the minimal dependencies. How do we typically fix that?

Eh2406 · 2018-09-17T20:29:22Z

So I don't think we have been doing this for long enough to have a "typically" yet. For the case of trying to get it setup we have some advice #5657 (comment) But this is the first time we have hit this form.

I'd try:

Can we bump the patch version of one of our direct affected dependencies to fix the problem?
One of the other direct dependencies?
Are any of the affected dependencies interested in a PR to work with minimal-versions? (termcolor is a BurntSushi project so yes.)
Adding a synthetic dependency to get things working?

I am working on a complicated patch to try #5921 (comment) but can jump in a little while.

Eh2406 · 2018-09-17T21:59:15Z

I could not reproduce on the nightly I was using, but rustup update and I could repro. So that is odd.

> cargo tree -i -p termcolor:0.3.0
    Updating crates.io index
termcolor v0.3.0
└── env_logger v0.5.4
    └── cargo v0.30.0 (file:///C:/Users/finkelman.SEMCOGDOM/Documents/MyProjects/cargo)

So I am trying bumping are env_logger to 0.5.11

alexcrichton · 2018-09-17T22:01:09Z

@bors: r+

bors · 2018-09-17T22:01:10Z

📌 Commit 682b295 has been approved by alexcrichton

bors · 2018-09-17T22:01:18Z

⌛ Testing commit 682b295 with merge 8201560...

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping As mentioned in #5921 (comment), the new proptest found a live bug! This PR so far tracs my attempt to minimize the problematic input. The problem turned out to be that we where backjumping on incomplete set of conflicts.

bors · 2018-09-17T22:56:49Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 8201560 to master...

rust-highfive assigned matklad Sep 6, 2018

Eh2406 added 8 commits September 6, 2018 17:17

Copy the full case from rust-lang@003c29b

dfad631

remove things depending on bad, rename some things

0781ce2

remove "D"

4cfe649

remove lots more

77cd4d7

rename things

ad852eb

renumber things

d6d3a04

split up some names

e6078b4

time to investigate the bug

a875816

Eh2406 force-pushed the explore_the_bug branch from 4fc1071 to a875816 Compare September 6, 2018 21:18

this oddly worked

69e00e8

Eh2406 force-pushed the explore_the_bug branch from d942236 to 69e00e8 Compare September 12, 2018 18:08

Eh2406 added 10 commits September 12, 2018 16:33

a new one to minimized

b0ab1f2

remove unused crates

00cbb0a

remove debs leading to remove now unused crates

47d9fe9

remove debs leading to remove now unused crates

70bfd73

rename deps

c77f60d

clean up commas

2f21280

a new one to minimized

9051dbf

remove debs leading to remove now unused crates

fbee2e7

rename deps

079ac47

small clean up

8707fcb

Eh2406 added 2 commits September 14, 2018 17:23

only skip in the backtrack_stack if the conflicting_activations i…

2b1f093

…s complete

cargo fmt

9b2295d

Eh2406 changed the title ~~WIP BUG fuzzing found a bug in the resolver, now how to fix it.~~ BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping Sep 14, 2018

bump env_logger to a more minimal-versions compatible one

682b295

bors merged commit 682b295 into rust-lang:master Sep 17, 2018

bors mentioned this pull request Sep 17, 2018

Download crates in parallel with HTTP/2 #6005

Merged

Eh2406 mentioned this pull request Sep 19, 2018

Idea from QC & the paper: classification of input to properties proptest-rs/proptest#34

Open

Eh2406 deleted the explore_the_bug branch September 20, 2018 02:04

ehuss added this to the 1.31.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

Eh2406 commented Sep 6, 2018 •

edited

Loading

rust-highfive commented Sep 6, 2018

Eh2406 commented Sep 7, 2018 •

edited

Loading

Eh2406 commented Sep 7, 2018 •

edited

Loading

Eh2406 commented Sep 12, 2018

Eh2406 commented Sep 14, 2018

Eh2406 commented Sep 14, 2018

Eh2406 commented Sep 15, 2018

alexcrichton commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

Eh2406 commented Sep 17, 2018

ehuss commented Sep 17, 2018

Eh2406 commented Sep 17, 2018

ehuss commented Sep 17, 2018 •

edited

Loading

Eh2406 commented Sep 17, 2018 •

edited

Loading

Eh2406 commented Sep 17, 2018

alexcrichton commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

Conversation

Eh2406 commented Sep 6, 2018 • edited Loading

rust-highfive commented Sep 6, 2018

Eh2406 commented Sep 7, 2018 • edited Loading

Eh2406 commented Sep 7, 2018 • edited Loading

Eh2406 commented Sep 12, 2018

Eh2406 commented Sep 14, 2018

Eh2406 commented Sep 14, 2018

Eh2406 commented Sep 15, 2018

alexcrichton commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

Eh2406 commented Sep 17, 2018

ehuss commented Sep 17, 2018

Eh2406 commented Sep 17, 2018

ehuss commented Sep 17, 2018 • edited Loading

Eh2406 commented Sep 17, 2018 • edited Loading

Eh2406 commented Sep 17, 2018

alexcrichton commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

bors commented Sep 17, 2018

Eh2406 commented Sep 6, 2018 •

edited

Loading

Eh2406 commented Sep 7, 2018 •

edited

Loading

Eh2406 commented Sep 7, 2018 •

edited

Loading

ehuss commented Sep 17, 2018 •

edited

Loading

Eh2406 commented Sep 17, 2018 •

edited

Loading