-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slimmify BTree by replacing the three Vecs in Node with a manually alloc... #18028
Conversation
cc @gankro |
|
||
// At any given time, there will be `_len` keys, `_len` values, and (in an internal node) | ||
// `_len + 1` edges. In a leaf node, there will never be any edges. | ||
_len: uint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make sure that I used the function len()
as much as possible, as I knew that the field was likely to be modified over time. It basically made me make the code easier to change.
FWIW, there's another possible tricks along these lines: we could store |
1 | ||
} else { | ||
mem::min_align_of::<Node<K, V>>() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor style (maybe perf?) nit, these two branches could be collapsed into let (edges_size, edges_align) = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume that if I collapse the edges, I should also collapse the keys and vals, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's as big of a deal, since they're set unconditionally. You certainly could for symmetry, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Interestingly, Github moved this comment to a different function from the one it was originally in.
@huonw I'm a bit worried about putting the length and capacity behind the allocation because it seems like it would be more inefficient to access/mutate them behind an extra layer of indirection. I can try it, but this seems like the same issue that made |
A bigger win, percentage-wise, would be just to use u8's or u16's for the len/capacity. I really don't think nodes bigger than 256 elems are that reasonable to expect. 16000 or whatever is definitely silly. Of course this will require a fair amount of (totally safe) upcasting, and may require some care to avoid overflows. |
@gereeter this differs to |
// Pop it | ||
let key = self.keys.pop().unwrap(); | ||
let val = self.vals.pop().unwrap(); | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code needs a guard for self.len() == 0
, at very least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
High level, this seems pretty good. Some of the implementation details are pretty dicey, though. This would benefit from some more documentation. @huonw What's the general policy on when to use |
@gereeter What's the status on this? |
6ef8c32
to
ab92ca1
Compare
@@ -506,130 +562,78 @@ mod stack { | |||
} | |||
} | |||
|
|||
pub fn with<T, F: for<'id> FnOnce(Pusher<'id, 'a, K, V>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs... etc
Basically at this point my only complaint is all the new rad tech needs to really be doc'd. Impl seems pretty solid, although BTree's gotten to the point that it's hard to keep it all in my head. |
Docs! Also, sorry about the rebase - I keep forgetting how annoying force pushing can be. |
Awesome! Beyond minor doc nits this LGTM. I definitely want a second opinion though. There's some crazy stuff in here. r=me with doc fixes and if @huonw @cgaebel or maybe even @nikomatsakis (dat InvariantLifetime usage) also signs off on it. |
Doc fixes are in. |
let map = self.map; | ||
map.length += 1; | ||
|
||
let mut stack = self.stack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move stack
(and map
) out of self? A move like this implies generating code for a memcpy
and memset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it because I was following the original code as closely as possible. However, it does seem to be unnecessary, so I can simplify it if you wish.
This looks good to me. Definitely an improvement over what was there before. |
Okay, fixed the doc fix typo and did @cgaebel's recommended simplification of insert. |
Okay I think this is good to go! r=me with squashing, unless @huonw wants to take another crack at it. Edit: huon does. Moar review! |
// Despite this, we store three separate pointers to the three "chunks" of the buffer because | ||
// the performance drops significantly if the locations of the vals and edges need to be | ||
// recalculated upon access. | ||
keys: *mut K, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two pointers will always be valid except when they have be zeroed to cancel the destructor, right?
(I would like see a commend to this effect, basically justifying the unsafe_no_drop_flag
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, keys
and vals
will only be null if the structure has only been dropped.
Oh, since this is still on-going, one thing I'd really like to see is a more robust fuzzing of BTreeMap in the tests now that it's built on all these raw components. I know for a fact there's at least one case in the old logic that test_basic didn't cover (and I forgot to update the tests when I fixed a bug related to it during development). |
} | ||
} | ||
|
||
// Vector functions (all unchecked) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'vector' mean in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that these functions are acting on Nodes
as if they were meaningless bunches of parallel vectors. These functions are direct translations of ones implemented on Vec
, like insert
, push
, and pop
. The comment is fairly meaningless, so I might just remove it.
Looks pretty reasonable to me, but the code is quite complicated and there's no way I can hold it all in my head especially in my current travel-addled state. |
@gereeter Yes, I'd like more uniform collection testing in the future. This would require either a pile of macros or collection traits, though. Anyway I don't want to block this on fuzzing. This has sat in the queue for too long! |
Should I squash, or is this waiting for some more review? |
YES! Squash iiiiiit 🎊 |
bbd737b
to
23e77ac
Compare
…held three separately allocated `Vec`s, with a manually allocated buffer. Additionally, restructure the node and stack interfaces to be safer and require fewer bounds checks. Before: test btree::map::bench::find_rand_100 ... bench: 35 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 88 ns/iter (+/- 3) test btree::map::bench::find_seq_100 ... bench: 36 ns/iter (+/- 1) test btree::map::bench::find_seq_10_000 ... bench: 62 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 157 ns/iter (+/- 8) test btree::map::bench::insert_rand_10_000 ... bench: 413 ns/iter (+/- 8) test btree::map::bench::insert_seq_100 ... bench: 272 ns/iter (+/- 10) test btree::map::bench::insert_seq_10_000 ... bench: 369 ns/iter (+/- 19) test btree::map::bench::iter_1000 ... bench: 19049 ns/iter (+/- 740) test btree::map::bench::iter_100000 ... bench: 1916737 ns/iter (+/- 102250) test btree::map::bench::iter_20 ... bench: 424 ns/iter (+/- 40) After: test btree::map::bench::find_rand_100 ... bench: 9 ns/iter (+/- 1) test btree::map::bench::find_rand_10_000 ... bench: 8 ns/iter (+/- 0) test btree::map::bench::find_seq_100 ... bench: 7 ns/iter (+/- 0) test btree::map::bench::find_seq_10_000 ... bench: 8 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 136 ns/iter (+/- 5) test btree::map::bench::insert_rand_10_000 ... bench: 380 ns/iter (+/- 34) test btree::map::bench::insert_seq_100 ... bench: 255 ns/iter (+/- 8) test btree::map::bench::insert_seq_10_000 ... bench: 364 ns/iter (+/- 10) test btree::map::bench::iter_1000 ... bench: 19112 ns/iter (+/- 837) test btree::map::bench::iter_100000 ... bench: 1911961 ns/iter (+/- 33069) test btree::map::bench::iter_20 ... bench: 453 ns/iter (+/- 37)
23e77ac
to
130fb08
Compare
Squashed! |
...ated buffer. Before: test btree::map::bench::find_rand_100 ... bench: 29 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 83 ns/iter (+/- 6) test btree::map::bench::find_seq_100 ... bench: 30 ns/iter (+/- 1) test btree::map::bench::find_seq_10_000 ... bench: 50 ns/iter (+/- 3) test btree::map::bench::insert_rand_100 ... bench: 186 ns/iter (+/- 30) test btree::map::bench::insert_rand_10_000 ... bench: 377 ns/iter (+/- 8) test btree::map::bench::insert_seq_100 ... bench: 299 ns/iter (+/- 10) test btree::map::bench::insert_seq_10_000 ... bench: 368 ns/iter (+/- 12) test btree::map::bench::iter_1000 ... bench: 20956 ns/iter (+/- 479) test btree::map::bench::iter_100000 ... bench: 2060899 ns/iter (+/- 44325) test btree::map::bench::iter_20 ... bench: 560 ns/iter (+/- 63) After: test btree::map::bench::find_rand_100 ... bench: 28 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 74 ns/iter (+/- 3) test btree::map::bench::find_seq_100 ... bench: 31 ns/iter (+/- 0) test btree::map::bench::find_seq_10_000 ... bench: 46 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 141 ns/iter (+/- 1) test btree::map::bench::insert_rand_10_000 ... bench: 273 ns/iter (+/- 12) test btree::map::bench::insert_seq_100 ... bench: 255 ns/iter (+/- 17) test btree::map::bench::insert_seq_10_000 ... bench: 340 ns/iter (+/- 3) test btree::map::bench::iter_1000 ... bench: 21193 ns/iter (+/- 1958) test btree::map::bench::iter_100000 ... bench: 2203599 ns/iter (+/- 100491) test btree::map::bench::iter_20 ... bench: 614 ns/iter (+/- 110) This code could probably be a fair bit cleaner, but it works. Part of #18009.
fix: lifetime hint panic in non generic defs
...ated buffer.
Before:
After:
This code could probably be a fair bit cleaner, but it works.
Part of #18009.