Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on Nightly in BTreeSet and BTreeMap #33197

Closed
potocpav opened this issue Apr 25, 2016 · 13 comments
Closed

Segfault on Nightly in BTreeSet and BTreeMap #33197

potocpav opened this issue Apr 25, 2016 · 13 comments
Labels
A-collections Area: std::collections. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@potocpav
Copy link

An iterator created by BTreeMap::range, BTreeMap::range_mut and BTreeSet::range can wander off of the underlying data and segfault. It is caused by supplying an upper bound that is lower than the lower bound and there is an element between them.

Code to reproduce: http://is.gd/HoNWLH

tested also locally on rustc 1.10.0-nightly (b5ba5923f 2016-04-21).

The iterator ends when the iterated element equals to the upper bound: https://doc.rust-lang.org/src/collections/up/src/libcollections/btree/map.rs.html#898-908. If the upper bound is lower than the lower bound, this condition is never met. This is not checked for while creating the iterator in https://doc.rust-lang.org/src/collections/up/src/libcollections/btree/map.rs.html#464-588.

@apasel422 apasel422 added A-collections Area: std::collections. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. labels Apr 25, 2016
@apasel422
Copy link
Contributor

CC @jooert, @gereeter.

@sfackler sfackler added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Apr 25, 2016
@bluss
Copy link
Member

bluss commented Apr 27, 2016

I was just looking at this briefly and I'm reminded the collection can not trust the user-supplied comparison functions (Ord) and thus the fix must be something more involved than comparing the input keys.

@potocpav
Copy link
Author

potocpav commented Apr 27, 2016

So it seems we must first compare start to end (if start>end, stop). Then go forward but check for the end of the BTree.

@gereeter
Copy link
Contributor

gereeter commented Apr 27, 2016

I see a few possible options here:

  • As @potocpav suggested, add an explicit check for the start > end case for well-behaved implementations of Ord, then while iterating, check that we don't run past the edge of the tree. It would probably be simplest to panic in the edge-of-tree case, but I suppose stopping would be possible as well. The primary issue I see with this solution is that it may slow down iteration to do those extra checks.
  • Check for the start > end case, but instead of doing it with a comparison, verify that the node references actually come in the correct order. This can be done by descending the tree for both bounds simultaneously, checking that when the two references diverge, start follows an edge to the left of end. The primary issue with this solution is the complexity and therefore potential bugginess. I think I'm in favor of this option, but I could be convinced to abandon it.

Ironically, the NodeRef API is actually perfectly safe here - there are no supposedly safe methods that allow you to run of the edge of the tree. However, to retain performance, I called unwrap_unchecked on the result of ascend, since that case was supposed to be impossible and one of the biggest reasons for the rewrite was for iteration performance. Turning those calls into unwrap should make the iterator instead panic upon hitting the edge.

@alexcrichton alexcrichton added P-medium Medium priority and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-nominated labels Apr 27, 2016
@Manishearth
Copy link
Member

add an explicit check for the start > end case for well-behaved implementations of Ord

Note that safety should never depend on well-behavedness of Ord. The additional check in that bullet point addresses this, but I'd like to point this out explicitly.

@apasel422
Copy link
Contributor

As a variant of @gereeter's second bullet point, the structural check can be done as a post-processing step in which we walk up the tree from both handles, keeping track of the path we take as we go (i.e. the index of the edges). Once we finish ascending, we compare the paths lexicographically, and return an empty Range if the comparison indicates that the bounds were specified in the wrong order.

This has the benefit of avoiding additional complexity in the descent code and iteration code, and is amenable to removal via specialization if we ever get an unsafe trait TrustedOrd.

@0x7CFE
Copy link

0x7CFE commented Jan 29, 2017

Reproduced on recent 1.16.0-nightly (df8debf6d 2017-01-25) with the similar setup: created a faulty Ord implementation which resulted in a silent segfault.

Just wanted to know, is there any progress regarding the issue? Looks like it's quite annoying bug because of it's cryptic nature. I spent quite a time trying to find why the program exits silently even without a panic message. Only debugger helped me to find the actual reason.

Recently there were some changes in the range API. Are they affecting the issue in some way?

@Manishearth
Copy link
Member

Could you share the Ord implementation testcase that caused this? Last time I checked the old testcases for this didn't work.

@0x7CFE
Copy link

0x7CFE commented Jan 30, 2017

Unfortunately I don't have minimal reproducing impl, but I'm working on it. Last observations were that if I replace in my code .range_mut((Excluded(&lower_bound), Included(&key))) with .range_mut((Unbounded, Included(&key))) all gets working.

I have rather complex keys (BitVectors actually) and I define Ord for two vectors as if they are bit representations of some huge numbers. Lower bound in my case is a vector that is 1 bit smaller than the lowest actual key I'm interested in. It is used to reduce the search space from below.

Currently I'm not sure, whether it's due to Ord or due to implementation of BTreeMap's range API that passes through the lower boundary and starts walking where it shouldn't.

@0x7CFE
Copy link

0x7CFE commented Feb 5, 2017

Indeed, in my case crash occurs when broken boundary logic yields the range where lower bound is actually greater that the upper one (in terms of Ord).

Please note that it's highly dependant on the actual in-memory structrure of the tree. In my case for this to happen btree should be filled to a certain point.

frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 14, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue rust-lang#33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 14, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue rust-lang#33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
bors added a commit that referenced this issue Feb 14, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue #33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 15, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue rust-lang#33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
bors added a commit that referenced this issue Feb 15, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue #33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
bors added a commit that referenced this issue Feb 15, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue #33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
@bvinc
Copy link
Contributor

bvinc commented Feb 19, 2017

This is fixed

@Manishearth
Copy link
Member

#39457

@Manishearth
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: std::collections. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants