Skip to content

WIP safe and sane B-Tree rewrite #16906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Aug 31, 2014

Total minimal rewrite of BTree to not use copy/clone at all, and to generally avoid unnecessary mutations. I have not yet implemented the "miscellaneous" collection things like iterators, Ord, Eq, Hash, etc as I want to nail the core stuff down first.

There is one substantial drawback to my implementation: it doesn't allow configuration of B. The benefit of this is that the nodes are made of [T,..n]'s instead of Vecs, putting more data contiguously and reducing the amount of indirection. I have done this under @aturon's recommendation that it would be better to write a BTree implementation that would benefit from the future possibility of generic integer arguments being available (I am not saying they will be added).

One consequence of this is that I currently store the Keys/Values in Options, because it's not clear to me if there's a safe way to work with partially intialized arrays, or to tell Rust to "forget" some or all of the array and not try to run destructors on the elements. I would love if someone could show me how to do that.

In addition, right now I do a lot of indexing which could probably be replaced with iterators to avoid bounds checks. I'm also not leveraging copy instrinsics to bulk shift the elements around. This first draft is emphasizing correctness and safety over performance, but in a way that's easily upgradable. I hope to have the time to start tuning the implementation in the coming days,

The actual code is literally 25% just comments describing the fairly complex logic of a BTree. I've tried to chunk things out into "grokable" functions where possible so that everyone understands what's going on.

Finally, as a quick sanity check, I ran treemap's bench suite on btree (make check-collections -j8 PLEASE_BENCH=1):

test treemap::bench::find_rand_100                         ... bench:        74 ns/iter (+/- 5)
test treemap::bench::find_rand_10_000                      ... bench:       164 ns/iter (+/- 3)
test treemap::bench::find_seq_100                          ... bench:        76 ns/iter (+/- 2)
test treemap::bench::find_seq_10_000                       ... bench:       115 ns/iter (+/- 2)
test treemap::bench::insert_rand_100                       ... bench:       126 ns/iter (+/- 2)
test treemap::bench::insert_rand_10_000                    ... bench:      1025 ns/iter (+/- 16)
test treemap::bench::insert_seq_100                        ... bench:       507 ns/iter (+/- 19)
test treemap::bench::insert_seq_10_000                     ... bench:       834 ns/iter (+/- 15)

test btree::bench::find_rand_100                           ... bench:        86 ns/iter (+/- 4)
test btree::bench::find_rand_10_000                        ... bench:       165 ns/iter (+/- 2)
test btree::bench::find_seq_100                            ... bench:        86 ns/iter (+/- 2)
test btree::bench::find_seq_10_000                         ... bench:       117 ns/iter (+/- 1)
test btree::bench::insert_rand_100                         ... bench:       270 ns/iter (+/- 10)
test btree::bench::insert_rand_10_000                      ... bench:       598 ns/iter (+/- 18)
test btree::bench::insert_seq_100                          ... bench:       386 ns/iter (+/- 8)
test btree::bench::insert_seq_10_000                       ... bench:       459 ns/iter (+/- 4)

As you can see, my implementation is generally competitive with treemap, and is substantially faster during large insertion workloads. And as discussed above, I haven't even begun to tune it; my choice of B is largely arbitrary!

If anyone is a low-level optimizing expert, or just knows good BTree design/tuning, I'd love to get any feedback on this work. If anyone has any other ideas or questions, I'm more than happy to discuss this with you!

@Gankra
Copy link
Contributor Author

Gankra commented Aug 31, 2014

If you are wiling to review this, I suggest viewing the whole file here, rather than the diff. This is a true clean-room rewrite.

// This implementation is largely based on the one described in *Open Data Structures*, which
// can be freely downloaded at http://opendatastructures.org/, and whose contents are as of this
// writing (August 2014) freely licensed under the following Creative Commons Attribution
// License: [CC BY 2.5 CA](http://creativecommons.org/licenses/by/2.5/ca/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this license compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't take any source code from it, I just used the high level analysis and design. I note the license here simply for completeness, and to indicate to readers that the text is free/libre/open if they want to refer to it for more details.

If it's a problem though, I work with the primary author and can easily talk to them about it.

@huonw
Copy link
Member

huonw commented Sep 1, 2014

There is so much unwinding in this code; at a guess, the landing pads etc. generated by the unwraps and [] indexing would be a significant proportion of the final code (and generally a compile-time and runtime performance drain); I would be very interested in seeing this reduced as much as possible.

I noted several places where this could be remove easily (anything with if is_leaf { ... basically).

Another possibility would be a method on Node like

fn view_parts(&self) -> (&[Option<K>], &[&Node<K,V>], &[Option<V>])

where each slice is cut to exactly the right length. It would be implemented unsafely, to coerce from Option<Box<Node<K,V>>> to &Node<K,V> (with a debug_assert! that the appropriate number of Options are actually Some).

@Gankra
Copy link
Contributor Author

Gankra commented Sep 1, 2014

@huonw I think mostly we just need a keys iterator that yields the keys. I imagine something like this:

Keys{ it: self.keys.iter() }
...
fn next() {
    self.it.next().and_then(|x| x.as_ref())
}

(and something similar for mut_keys)

I could also probably pass around keys/values in their Options more, since I think there's some places where I unwrap them for a function boundary, and then just rewrap them at the caller.

I could also probably replaces some foo[i]'s with foo.unsafe_get's?

@Gankra
Copy link
Contributor Author

Gankra commented Sep 3, 2014

Okay, sorry I went radio silent on this for a bit. I just moved, so everything was a bit hectic. I can now begin working at cleaning this up and optimizing.

@Gankra
Copy link
Contributor Author

Gankra commented Sep 5, 2014

So yeah, I think I'm going to go forward with a RawNode design that would basically be mini RawTables as seen in #16628. The option dance, while being really nice for a correctness and safety check is a serious bloat on the whole thing. Although I haven't reflected too hard on detailed RawNode design, so it might all be rubbish to do.

As such this is going to get pretty aggressively refactored, and so I'm considering just closing this now, pending that work.

That said, the main search/insert/remove logic shouldn't change too much, as RawNodes should just be providing a safe interface to the basic iteration/indexing/shifting/splicing stuff. Further, this is still a decent base to work, as demonstrated by my benchmarks.

@Gankra
Copy link
Contributor Author

Gankra commented Sep 5, 2014

Regardless of how I proceed, I've applied many of the suggested fixes. I've also factored out search into a method, which significantly reduced code duplication, unwraps, and indexing while produced much more rustic code! Nice!

This also provides a hint of what RawNode might look like. I started toying with Found and Bound being structs in their own right with relevant interfaces (Bound only supports going to the next node and getting the index; Found gives full access). However I stopped once I settled on a larger refactor.

Node {
length: 0,
// FIXME(Gankro): this is gross, I guess you need a macro? [None, ..capacity] uses copy
keys: [None, None, None, None, None, None, None, None, None, None, None],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces

@huonw
Copy link
Member

huonw commented Sep 5, 2014

I opened #16998 which would remove all the unwraps (at least, we could use that infrastructure, somewhat); could you add a fixme?

@Gankra
Copy link
Contributor Author

Gankra commented Sep 7, 2014

Updated perf results (on a different machine than the first batch):

test treemap::bench::find_rand_100                         ... bench:        75 ns/iter (+/- 7)
test treemap::bench::find_rand_10_000                      ... bench:       194 ns/iter (+/- 8)
test treemap::bench::find_seq_100                          ... bench:        71 ns/iter (+/- 7)
test treemap::bench::find_seq_10_000                       ... bench:        98 ns/iter (+/- 2)

test treemap::bench::insert_rand_100                       ... bench:       174 ns/iter (+/- 16)
test treemap::bench::insert_rand_10_000                    ... bench:      1171 ns/iter (+/- 35)
test treemap::bench::insert_seq_100                        ... bench:       551 ns/iter (+/- 8)
test treemap::bench::insert_seq_10_000                     ... bench:       904 ns/iter (+/- 12)
test btree::bench::find_rand_100                           ... bench:        95 ns/iter (+/- 9)
test btree::bench::find_rand_10_000                        ... bench:       220 ns/iter (+/- 5)
test btree::bench::find_seq_100                            ... bench:        92 ns/iter (+/- 4)
test btree::bench::find_seq_10_000                         ... bench:       155 ns/iter (+/- 15)

test btree::bench::insert_rand_100                         ... bench:       376 ns/iter (+/- 41)
test btree::bench::insert_rand_10_000                      ... bench:       671 ns/iter (+/- 15)
test btree::bench::insert_seq_100                          ... bench:       440 ns/iter (+/- 5)
test btree::bench::insert_seq_10_000                       ... bench:       543 ns/iter (+/- 12)

Interesting that search seems to have regressed, maybe some #[inline]'s are in order.

@Gankra
Copy link
Contributor Author

Gankra commented Sep 7, 2014

@huonw Is the current state sufficient to include as a "okay start" (assuming I impl all the iterator/cmp/hash stuff too), or would you like to see some more refactors and unsafe_get's?

@Gankra
Copy link
Contributor Author

Gankra commented Sep 8, 2014

@huonw nvm, I'm going forward with RawNode because I had a flash of design insight that should make the code super nice.

@huonw
Copy link
Member

huonw commented Sep 8, 2014

I'm very happy to go with the as-safe-as-possible version with Option<T> etc, fwiw.

@Gankra
Copy link
Contributor Author

Gankra commented Sep 9, 2014

The current design is frustrating to me for a few reasons:

  • The Options are hacking around the fact that there we can't have partially initialized arrays
  • Leaves "have" children, when they shouldn't (and in fact, a leaf never "becomes" internal, so having them there is truly worthless).
  • We have to choose to either duplicate code for internal and leaf nodes, or be inefficient by uselessly shifting around None's in leaves.
  • Leaves and Internal nodes are not represented in the type system at all, you just have to know when to do what.

My proposed redesign has the following primary structure:

/// A Raw node handles the common code between Internal and Leaf nodes. Generally, it does all the
/// work assuming the Node is Internal, and for Leaf nodes the edge manipulation should just get
/// optimized away because of how empty types are treated.
struct RawNode<K, V, E> {
    length: uint,
    elems: *mut K,
    marker:   marker::CovariantType<(K, V, E)>,
}

pub struct InternalNode<K, V> {
    raw: RawNode<K, V, Node<K,V>>,
}

pub struct LeafNode<K, V> {
    raw: RawNode<K, V, ()>,
}

pub enum Node<K, V> {
    Internal(InternalNode<K, V>),
    Leaf(LeafNode<K, V>),
}

Where RawNode's "elems" is in fact a buffer allocated to fit all the maximum amount of keys, nodes, and edges a B-Tree node can hold by-value, mirroring the design of RawTable from the current HashMap design.

With this design, we get rid of all the Option dancing, and leaves don't allocate space for edges. Further, by implementing all the operations on RawNode, we get impls for Leaves and Internal nodes at once, while optimizing away the edge manipulation, and letting the distinction appear in the type system.

However, this is all of course very unsafe. So we expose a "safe" interface in the same way that RawTable does: by exposing a cursor-based interface for walking through the node's elements.

@Gankra
Copy link
Contributor Author

Gankra commented Sep 9, 2014

Hrm... I'm running into issues with being able to express that all of a RawNode's children should be the same type of Node, which makes handle_underflow super awkward.

@Gankra
Copy link
Contributor Author

Gankra commented Sep 9, 2014

@huonw I've experimentally refactored the array-based design to use Vec's and inline nodes rather than arrays and boxed nodes. The performance seems comparable and the implementation is much safer and saner. I gotta go atm, but I'll push up that design as a candidate replacement later today hopefully. In the mean time interested parties can check it out in isolation at: https://github.com/Gankro/rust-datastructures/tree/btree-vec/src/btree

Also: as an added bonus, Vec's permit us to have configurable B again! Whoo!

@Gankra
Copy link
Contributor Author

Gankra commented Sep 13, 2014

Closing while I finish this up out-of-tree.

@Gankra Gankra closed this Sep 13, 2014
@Gankra
Copy link
Contributor Author

Gankra commented Sep 15, 2014

Update: impl is done, just need to write (read: steal from TreeMap) more tests. Need to patch Vec's MoveItems first, though.

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants