Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slimmify BTree by replacing the three Vecs in Node with a manually alloc... #18028

Merged
merged 1 commit into from
Dec 12, 2014

Conversation

gereeter
Copy link
Contributor

...ated buffer.

Before:

test btree::map::bench::find_rand_100                      ... bench:        29 ns/iter (+/- 2)
test btree::map::bench::find_rand_10_000                   ... bench:        83 ns/iter (+/- 6)
test btree::map::bench::find_seq_100                       ... bench:        30 ns/iter (+/- 1)
test btree::map::bench::find_seq_10_000                    ... bench:        50 ns/iter (+/- 3)
test btree::map::bench::insert_rand_100                    ... bench:       186 ns/iter (+/- 30)
test btree::map::bench::insert_rand_10_000                 ... bench:       377 ns/iter (+/- 8)
test btree::map::bench::insert_seq_100                     ... bench:       299 ns/iter (+/- 10)
test btree::map::bench::insert_seq_10_000                  ... bench:       368 ns/iter (+/- 12)
test btree::map::bench::iter_1000                          ... bench:     20956 ns/iter (+/- 479)
test btree::map::bench::iter_100000                        ... bench:   2060899 ns/iter (+/- 44325)
test btree::map::bench::iter_20                            ... bench:       560 ns/iter (+/- 63)

After:

test btree::map::bench::find_rand_100                      ... bench:        28 ns/iter (+/- 2)
test btree::map::bench::find_rand_10_000                   ... bench:        74 ns/iter (+/- 3)
test btree::map::bench::find_seq_100                       ... bench:        31 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        46 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100                    ... bench:       141 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       273 ns/iter (+/- 12)
test btree::map::bench::insert_seq_100                     ... bench:       255 ns/iter (+/- 17)
test btree::map::bench::insert_seq_10_000                  ... bench:       340 ns/iter (+/- 3)
test btree::map::bench::iter_1000                          ... bench:     21193 ns/iter (+/- 1958)
test btree::map::bench::iter_100000                        ... bench:   2203599 ns/iter (+/- 100491)
test btree::map::bench::iter_20                            ... bench:       614 ns/iter (+/- 110)

This code could probably be a fair bit cleaner, but it works.

Part of #18009.

@rust-highfive
Copy link
Collaborator

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@gereeter
Copy link
Contributor Author

cc @gankro


// At any given time, there will be `_len` keys, `_len` values, and (in an internal node)
// `_len + 1` edges. In a leaf node, there will never be any edges.
_len: uint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why _?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure that I used the function len() as much as possible, as I knew that the field was likely to be modified over time. It basically made me make the code easier to change.

@huonw
Copy link
Member

huonw commented Oct 14, 2014

FWIW, there's another possible tricks along these lines: we could store length and capacity inside the allocation, cutting off two inline words, that, allocate {uint, uint, keys..., values..., subnodes...} and so store 3 words instead of 5 (possibly reordered to be placed to make the alignments work more nicely).

1
} else {
mem::min_align_of::<Node<K, V>>()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style (maybe perf?) nit, these two branches could be collapsed into let (edges_size, edges_align) = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume that if I collapse the edges, I should also collapse the keys and vals, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's as big of a deal, since they're set unconditionally. You certainly could for symmetry, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Interestingly, Github moved this comment to a different function from the one it was originally in.

@gereeter
Copy link
Contributor Author

@huonw I'm a bit worried about putting the length and capacity behind the allocation because it seems like it would be more inefficient to access/mutate them behind an extra layer of indirection. I can try it, but this seems like the same issue that made Vec faster than ~[].

@Gankra
Copy link
Contributor

Gankra commented Oct 14, 2014

A bigger win, percentage-wise, would be just to use u8's or u16's for the len/capacity. I really don't think nodes bigger than 256 elems are that reasonable to expect. 16000 or whatever is definitely silly. Of course this will require a fair amount of (totally safe) upcasting, and may require some care to avoid overflows.

@huonw
Copy link
Member

huonw commented Oct 14, 2014

@gereeter this differs to ~[] vs. Vec somewhat because these allocations are never reallocated. (This can definitely be a later experiment, it does not have to be part of this PR.)

// Pop it
let key = self.keys.pop().unwrap();
let val = self.vals.pop().unwrap();
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code needs a guard for self.len() == 0, at very least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Gankra
Copy link
Contributor

Gankra commented Oct 14, 2014

High level, this seems pretty good. Some of the implementation details are pretty dicey, though. This would benefit from some more documentation.

@huonw What's the general policy on when to use assert vs debug_assert?

@Gankra
Copy link
Contributor

Gankra commented Oct 25, 2014

@gereeter What's the status on this?

@@ -506,130 +562,78 @@ mod stack {
}
}

pub fn with<T, F: for<'id> FnOnce(Pusher<'id, 'a, K, V>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs... etc

@Gankra
Copy link
Contributor

Gankra commented Dec 10, 2014

Basically at this point my only complaint is all the new rad tech needs to really be doc'd. Impl seems pretty solid, although BTree's gotten to the point that it's hard to keep it all in my head.

@gereeter
Copy link
Contributor Author

Docs!

Also, sorry about the rebase - I keep forgetting how annoying force pushing can be.

@Gankra
Copy link
Contributor

Gankra commented Dec 10, 2014

Awesome! Beyond minor doc nits this LGTM. I definitely want a second opinion though. There's some crazy stuff in here. r=me with doc fixes and if @huonw @cgaebel or maybe even @nikomatsakis (dat InvariantLifetime usage) also signs off on it.

@gereeter
Copy link
Contributor Author

Doc fixes are in.

let map = self.map;
map.length += 1;

let mut stack = self.stack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move stack (and map) out of self? A move like this implies generating code for a memcpy and memset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it because I was following the original code as closely as possible. However, it does seem to be unnecessary, so I can simplify it if you wish.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 10, 2014

This looks good to me. Definitely an improvement over what was there before.

@gereeter
Copy link
Contributor Author

Okay, fixed the doc fix typo and did @cgaebel's recommended simplification of insert.

@Gankra
Copy link
Contributor

Gankra commented Dec 11, 2014

Okay I think this is good to go! r=me with squashing, unless @huonw wants to take another crack at it.

Edit: huon does. Moar review!

// Despite this, we store three separate pointers to the three "chunks" of the buffer because
// the performance drops significantly if the locations of the vals and edges need to be
// recalculated upon access.
keys: *mut K,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two pointers will always be valid except when they have be zeroed to cancel the destructor, right?

(I would like see a commend to this effect, basically justifying the unsafe_no_drop_flag.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, keys and vals will only be null if the structure has only been dropped.

@Gankra
Copy link
Contributor

Gankra commented Dec 11, 2014

Oh, since this is still on-going, one thing I'd really like to see is a more robust fuzzing of BTreeMap in the tests now that it's built on all these raw components. I know for a fact there's at least one case in the old logic that test_basic didn't cover (and I forgot to update the tests when I fixed a bug related to it during development).

}
}

// Vector functions (all unchecked)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 'vector' mean in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that these functions are acting on Nodes as if they were meaningless bunches of parallel vectors. These functions are direct translations of ones implemented on Vec, like insert, push, and pop. The comment is fairly meaningless, so I might just remove it.

@huonw
Copy link
Member

huonw commented Dec 11, 2014

Looks pretty reasonable to me, but the code is quite complicated and there's no way I can hold it all in my head especially in my current travel-addled state.

@gereeter
Copy link
Contributor Author

@huonw I think I've fixed all the issues you brought up.
@gankro If you want fuzz testing, I think the ideal would be just doing a truly random sequence of operations on all the Map types at once, checking that they all return identical results.

@Gankra
Copy link
Contributor

Gankra commented Dec 11, 2014

@gereeter Yes, I'd like more uniform collection testing in the future. This would require either a pile of macros or collection traits, though. Anyway I don't want to block this on fuzzing. This has sat in the queue for too long!

@gereeter
Copy link
Contributor Author

Should I squash, or is this waiting for some more review?

@Gankra
Copy link
Contributor

Gankra commented Dec 12, 2014

YES!

Squash iiiiiit

🎊

@cgaebel
Copy link
Contributor

cgaebel commented Dec 12, 2014

…held three separately allocated `Vec`s, with a manually allocated buffer. Additionally, restructure the node and stack interfaces to be safer and require fewer bounds checks.

Before:
test btree::map::bench::find_rand_100                      ... bench:        35 ns/iter (+/- 2)
test btree::map::bench::find_rand_10_000                   ... bench:        88 ns/iter (+/- 3)
test btree::map::bench::find_seq_100                       ... bench:        36 ns/iter (+/- 1)
test btree::map::bench::find_seq_10_000                    ... bench:        62 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100                    ... bench:       157 ns/iter (+/- 8)
test btree::map::bench::insert_rand_10_000                 ... bench:       413 ns/iter (+/- 8)
test btree::map::bench::insert_seq_100                     ... bench:       272 ns/iter (+/- 10)
test btree::map::bench::insert_seq_10_000                  ... bench:       369 ns/iter (+/- 19)
test btree::map::bench::iter_1000                          ... bench:     19049 ns/iter (+/- 740)
test btree::map::bench::iter_100000                        ... bench:   1916737 ns/iter (+/- 102250)
test btree::map::bench::iter_20                            ... bench:       424 ns/iter (+/- 40)

After:
test btree::map::bench::find_rand_100                      ... bench:         9 ns/iter (+/- 1)
test btree::map::bench::find_rand_10_000                   ... bench:         8 ns/iter (+/- 0)
test btree::map::bench::find_seq_100                       ... bench:         7 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:         8 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100                    ... bench:       136 ns/iter (+/- 5)
test btree::map::bench::insert_rand_10_000                 ... bench:       380 ns/iter (+/- 34)
test btree::map::bench::insert_seq_100                     ... bench:       255 ns/iter (+/- 8)
test btree::map::bench::insert_seq_10_000                  ... bench:       364 ns/iter (+/- 10)
test btree::map::bench::iter_1000                          ... bench:     19112 ns/iter (+/- 837)
test btree::map::bench::iter_100000                        ... bench:   1911961 ns/iter (+/- 33069)
test btree::map::bench::iter_20                            ... bench:       453 ns/iter (+/- 37)
@gereeter
Copy link
Contributor Author

Squashed!

bors added a commit that referenced this pull request Dec 12, 2014
...ated buffer.

Before:

    test btree::map::bench::find_rand_100                      ... bench:        29 ns/iter (+/- 2)
    test btree::map::bench::find_rand_10_000                   ... bench:        83 ns/iter (+/- 6)
    test btree::map::bench::find_seq_100                       ... bench:        30 ns/iter (+/- 1)
    test btree::map::bench::find_seq_10_000                    ... bench:        50 ns/iter (+/- 3)
    test btree::map::bench::insert_rand_100                    ... bench:       186 ns/iter (+/- 30)
    test btree::map::bench::insert_rand_10_000                 ... bench:       377 ns/iter (+/- 8)
    test btree::map::bench::insert_seq_100                     ... bench:       299 ns/iter (+/- 10)
    test btree::map::bench::insert_seq_10_000                  ... bench:       368 ns/iter (+/- 12)
    test btree::map::bench::iter_1000                          ... bench:     20956 ns/iter (+/- 479)
    test btree::map::bench::iter_100000                        ... bench:   2060899 ns/iter (+/- 44325)
    test btree::map::bench::iter_20                            ... bench:       560 ns/iter (+/- 63)

After:

    test btree::map::bench::find_rand_100                      ... bench:        28 ns/iter (+/- 2)
    test btree::map::bench::find_rand_10_000                   ... bench:        74 ns/iter (+/- 3)
    test btree::map::bench::find_seq_100                       ... bench:        31 ns/iter (+/- 0)
    test btree::map::bench::find_seq_10_000                    ... bench:        46 ns/iter (+/- 0)
    test btree::map::bench::insert_rand_100                    ... bench:       141 ns/iter (+/- 1)
    test btree::map::bench::insert_rand_10_000                 ... bench:       273 ns/iter (+/- 12)
    test btree::map::bench::insert_seq_100                     ... bench:       255 ns/iter (+/- 17)
    test btree::map::bench::insert_seq_10_000                  ... bench:       340 ns/iter (+/- 3)
    test btree::map::bench::iter_1000                          ... bench:     21193 ns/iter (+/- 1958)
    test btree::map::bench::iter_100000                        ... bench:   2203599 ns/iter (+/- 100491)
    test btree::map::bench::iter_20                            ... bench:       614 ns/iter (+/- 110)

This code could probably be a fair bit cleaner, but it works.

Part of #18009.
@bors bors merged commit 130fb08 into rust-lang:master Dec 12, 2014
@gereeter gereeter deleted the slimmer-btree-node branch December 17, 2015 01:30
lnicola pushed a commit to lnicola/rust that referenced this pull request Sep 25, 2024
fix: lifetime hint panic in non generic defs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants