Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dont segfault if btree range is not in order #39457

Merged
merged 1 commit into from
Feb 16, 2017
Merged

Conversation

bvinc
Copy link
Contributor

@bvinc bvinc commented Feb 2, 2017

This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case.

Open questions:

  1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5.
4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @BurntSushi (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.


min_edge = match (min_found, min) {
(false, Included(key)) => {
match search::search_linear(&min_node, key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the search_tree calls to search_linear? (FYI, I'm not familiar with the btree impl.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_tree searches the entire tree and returns a handle anywhere in the tree. I need to search a single node at a time and be able to compare 2 searches. You might think that I want search_node, but that returns a handle. I need to be able to compare 2 edges of the same node, which I can't. search_linear just gives me an index value into the node, which is exactly what I need.

@BurntSushi
Copy link
Member

cc @rust-lang/libs

Someone more familiar with the btree impl should review this. cc @gereeter

With respect to the panic, I think that's consistent behavior with other collections. Either way, I think this behavior should be documented.

@alexcrichton
Copy link
Member

also cc @arthurprs and @apasel422

@arthurprs
Copy link
Contributor

arthurprs commented Feb 4, 2017

This is very smart 👍

I wonder if it makes sense to have a fn nodes_diverge(n1, n2) in node.rs that take 2 handles and returns if n1.node == n2.node && n1.idx != n2.idx, I didn't really try it but I guess it can really simplify the range function (with less local vars like the previous version).

@alexcrichton
Copy link
Member

@bors: delegate=arthurprs

@bors
Copy link
Contributor

bors commented Feb 4, 2017

✌️ @arthurprs can now approve this pull request

@bvinc bvinc changed the title Panic if btree range is not in order to avoid segfault Dont segfault if btree range is not in order Feb 7, 2017
@bvinc
Copy link
Contributor Author

bvinc commented Feb 7, 2017

I've made some small cleanup changes, implemented range_mut as well, and changed the implementation so that it should never panic, but only return an empty iterator.

@arthurprs That would only get rid of one variable, right? Personally, it doesn't seem worth it, and it seems to obfuscate the way I only care about comparing them until the point that they diverge, after which I don't care anymore.

@bvinc
Copy link
Contributor Author

bvinc commented Feb 7, 2017

I'd really like to steer the discussion towards the strange cases of range(..) and what the behavior should be. I've read all of the rfcs and proposals and documentation that I can find, but no one seems to address these cases. If you know of where it's documented, let me know.

If it's not currently documented, who makes these decisions? It might seem like an extremely small detail, but changing behavior later would be a breaking change.

The way I see it, there are 8 interesting cases to think about, and 3 basic ways to handle them:

Weird cases

  1. range(Included(A), Included(A))
  2. range(Included(A), Excluded(A))
  3. range(Excluded(A), Included(A))
  4. range(Excluded(A), Excluded(A))

where B > A:
5. range(Included(B), Included(A))
6. range(Included(B), Excluded(A))
7. range(Excluded(B), Included(A))
8. range(Excluded(B), Excluded(A))

Option 1

Return an empty iterator on cases 2-3. Immediately panic on cases 4-8. During iteration of the tree, if the 2 tree searches "cross directions" in the middle of searching, we know for sure that Ord is ill-defined, and the function will panic.

Pros:

  • Will catch errors early

Cons:

  • Might cause panics in programs where an empty iterator would have been fine
  • Must oddly document case add an IL type checker #4 as a case that panics

Option 2

Return an empty iterator on cases 2-4. Immediately panic on cases 5-8. Case 4 would be a strange case where in order to prevent a seg fault, one side of the Range might not end up where you would expect in order to force an empty iterator. Eg: the range.back handle might be on the right side of A, even though A was excluded.

Pros:

  • Can easily document that it panics when min_key > max_key

Cons:

  • Might cause panics in programs where an empty iterator would have been fine
  • Case add an IL type checker #4 will be a weird special case

Option 3

Return an empty iterator on cases 2-8, never panic. Declare that the range returned will always have range.back equal or to the right of range.front. This will naturally cause 2-8 to all be empty iterators.

Pros:

  • Never panics
  • Matches the intuition that you're getting every key where key > min_key && key < max_key. If that's nothing, that's fine.

Cons:

  • Might not catch programmer errors early

Copy link
Contributor

@gereeter gereeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not getting around to looking at this earlier. This looks good to me, modulo unifying the mutable and immutable cases.

Personally, I think the most logical behavior is to panic on cases 4-8 and return an empty iterator on cases 2 and 3 (Option 1). This corresponds to panicking when the two ends cross, but not when they touch. I don't think documenting case 4 as panicking is that odd - intuitively, it is starting after A and ending before A. That just sounds wrong. In contrast, cases 2 and 3 are starting and ending at the same place.


if !diverged {
// Don't allow max_edge to go to the left of min_edge
if max_edge < min_edge { max_edge = min_edge; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do go this route of never panicking, this also could be an optimization - just search for max_edge starting from min_edge. This would never produced "crossed nodes" by construction, and would require fewer comparisons. I don't know whether this is worth it though. Additionally, it can be done later.

}
GoDown(bottom) => bottom,
}
loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is identical to the range case and fairly complex, so I think it would be useful to pull this logic out into a function that is generic over mutability (taking two aliasing root pointers and returning two leaf handles).

GoDown(bottom) => bottom,
}
loop {
let min_edge = match (min_found, range.start()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that there is an at-all clean way to do this, but I'm curious whether it would be more performant to check Unbounded vs. Included vs. Excluded before the loop. I expect it doesn't matter, though - it would just be removing some predictable branches, I think.

@@ -777,69 +773,65 @@ impl<K: Ord, V> BTreeMap<K, V> {
reason = "matches collection reform specification, waiting for dust to settle",
issue = "27787")]
pub fn range_mut<T: ?Sized, R>(&mut self, range: R) -> RangeMut<K, V>
where T: Ord, K: Borrow<T>, R: RangeArgument<T>
where T: Ord, K: Borrow<T>, R: RangeArgument<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this indentation change should probably not be here.

@aturon
Copy link
Member

aturon commented Feb 7, 2017

cc @rust-lang/libs about the behavioral options laid out in #39457 (comment).

Analogous APIs for slices, such as subslicing, tend to panic on nonsensical ranges, which I think corresponds most closely to Option 1. I think this is the right convention to follow, because such ranges are usually produced by a bug.

@arthurprs
Copy link
Contributor

I'm not fond of panicking in recoverable errors but Option 1 has precedents from the slice api, so that's probably the way to go.

@alexcrichton
Copy link
Member

I agree with @aturon that erring on the side of more panics here seems good. That should be the conservative change here and I think we've got room to return empty iterators in the future if we really need to?

@bvinc bvinc force-pushed the master branch 3 times, most recently from 01e4816 to e8c63d0 Compare February 8, 2017 07:57
@bvinc
Copy link
Contributor Author

bvinc commented Feb 8, 2017

I've changed the code to implement option #1 and added unit tests for those cases. I took the advice of @gereeter and pulled out the range search logic into a common function used by range and range_mut.

@arthurprs
Copy link
Contributor

@bvinc could you add some docstrings explaining the panic behavior?

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 13, 2017
@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Feb 14, 2017

📌 Commit fb91047 has been approved by alexcrichton

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 14, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue rust-lang#33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 14, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue rust-lang#33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
@bors
Copy link
Contributor

bors commented Feb 14, 2017

⌛ Testing commit fb91047 with merge ea451ec...

bors added a commit that referenced this pull request Feb 14, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue #33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
@bors
Copy link
Contributor

bors commented Feb 14, 2017

💔 Test failed - status-appveyor

@alexcrichton
Copy link
Member

alexcrichton commented Feb 14, 2017 via email

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 15, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue rust-lang#33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
@bors
Copy link
Contributor

bors commented Feb 15, 2017

⌛ Testing commit fb91047 with merge 7edcd67...

@bors
Copy link
Contributor

bors commented Feb 15, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

alexcrichton commented Feb 15, 2017 via email

@bors
Copy link
Contributor

bors commented Feb 15, 2017

⌛ Testing commit fb91047 with merge 4c4925f...

bors added a commit that referenced this pull request Feb 15, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue #33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
@bors
Copy link
Contributor

bors commented Feb 15, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

alexcrichton commented Feb 15, 2017 via email

@bors
Copy link
Contributor

bors commented Feb 15, 2017

⌛ Testing commit fb91047 with merge 0d54a22...

@bors
Copy link
Contributor

bors commented Feb 15, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

alexcrichton commented Feb 15, 2017 via email

@bors
Copy link
Contributor

bors commented Feb 15, 2017

⌛ Testing commit fb91047 with merge 4d6019d...

bors added a commit that referenced this pull request Feb 15, 2017
Dont segfault if btree range is not in order

This is a first attempt to fix issue #33197.  The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key.  I currently panic if that is not the case.

Open questions:

1.  Do we want to panic in this error case or do we want to return an empty iterator?  The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type.  Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2.  (Included(101), Excluded(100)) on a map that contains [1,2,3].  Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3.  (Excluded(5), Excluded(5)).  The keys are equal but BTree edges end up backwards if the map contains 5.
4.  (Included(5), Excluded(5)).  Should naturally produce an empty iterator, right?
@bors
Copy link
Contributor

bors commented Feb 16, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 4d6019d to master...

@bors bors merged commit fb91047 into rust-lang:master Feb 16, 2017
bors added a commit that referenced this pull request Feb 7, 2020
BtreeMap range_search spruced up

#39457 created a lower level entry point for `range_search` to operate on, but it's really not hard to move it up a level of abstraction, making it somewhat shorter and reusing existing unsafe code (`new_edge` is unsafe although it is currently not tagged as such).

Benchmark added. Comparison says there's no real difference:
```
>cargo benchcmp old3.txt new3.txt --threshold 5
 name                                           old3.txt ns/iter  new3.txt ns/iter  diff ns/iter   diff %  speedup
 btree::map::find_seq_100                       19                21                           2   10.53%   x 0.90
 btree::map::range_excluded_unbounded           3,117             2,838                     -279   -8.95%   x 1.10
 btree::map::range_included_unbounded           1,768             1,871                      103    5.83%   x 0.94
 btree::set::intersection_10k_neg_vs_10k_pos    35                37                           2    5.71%   x 0.95
 btree::set::intersection_staggered_100_vs_10k  2,488             2,314                     -174   -6.99%   x 1.08
 btree::set::is_subset_10k_vs_100               3                 2                           -1  -33.33%   x 1.50
```

r? @Mark-Simulacrum
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants