Implement Clone::clone_from for VecDeque #65069

crgl · 2019-10-03T19:31:57Z

See #28481. For simple data types with the target much longer than the source, this implementation can be significantly slower than the default (probably due to the use of truncate). However, it should be substantially faster when cloning from nested data structures with similar shapes or when cloning from VecDeques with similar lengths, hopefully more common use cases for clone_from.

rust-highfive · 2019-10-03T19:32:07Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

Centril · 2019-10-03T20:59:44Z

r? @bluss

src/liballoc/collections/vec_deque.rs

crgl · 2019-10-05T21:10:46Z

Obviously this needs a better test now that it's taking more into account (I did of course make sure myself that it didn't panic in any of the cases, but I need to write a cleaner and more concise test for it to actually be in the library). I think this has some advantages: it's entirely safe, and it consistently outperforms the naive implementation. I do think truncate should be modified unless there's a compelling reason not to (at least, it seems like it should be able to take advantage of the mem::needs_drop::<T>() check the way Vec does), but I'm not sure that needs to be in this PR. Somewhat similarly, I think extend could be replaced by a more specialized implementation here, but my attempt at improvement went poorly. The only other thing I changed was adding overrides to the nth method because it seems like low-hanging performance fruit that affects this method. In the meantime, I'm happy to change any aspect of this, clean up commits, etc. Just let me know!

bluss · 2019-10-06T09:42:33Z

src/liballoc/collections/vec_deque.rs

+    fn nth(&mut self, n: usize) -> Option<Self::Item> {
+        if n >= count(self.tail, self.head, self.ring.len()) {
+            self.tail = self.head;
+            self.next()


I'd prefer to just put None here to simplify. What's the reason to go through next, does it have maintainability benefits (I don't really see it).

If we go through next - we can factor it out of the ~~loop~~ if, though.

Yes, I think there's no reason not to use None. Is it okay if I make the change and squash it into the original nth commit, or is that bad form?

Sure, that's fine. It's great to not squash in bigger updates, but now you've even mentioned what it is, so it's only more efficent to squash it in.

bluss · 2019-10-06T09:48:03Z

It looks good to me. Since it's a recent topic, how do we evaluate this in terms of code size (Rust and compiled code) and whether the improvements are worth the price (if there is a price)? @scottmcm and @alexcrichton

alexcrichton · 2019-10-07T23:22:50Z

The price of clone_from implementations can generally be quantified with build time for std and how it's a bit more confusing to read/verify. (and also a hidden cost for possible bugs to come up in this)

We don't have a strict policy per se on whether we take implementations or not. It's largely up to the reviewer.

bluss · 2019-10-08T17:20:13Z

src/liballoc/collections/vec_deque.rs

+                s_front.clone_from_slice(&o_front[..s_front_len]);
+                s_back[..overlap].clone_from_slice(&o_front[s_front_len..]);
+                s_back[overlap..].clone_from_slice(&o_back[..cap]);
+            },


I was wondering if we have to stem the bloat of branches here, to have just 1 branch, not 6.

Sorry for writing so much code, I ended up prototyping that like this - https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6ff3024edcb82d8f3c605963f0ba9dbf

Using an iterator to pair parts of the two slice tuples together.

It's not necessarily a better approach. I'd like to test compile both and look at the compile time/code size. The code in the PR would have a huge code size if the clone_from_slice calls were inlined, so that's the main thing I'd be afraid of. (Also the potential for the bounds checks to add up?)

I really like that iterator. There is a private helper struct for iteration in btree as well, so it's not particularly bizarre (I just checked because I hadn't considered it and it's very satisfying). I'm not sure about the compile time/code size differences, but I do think the whole approach is better if it's comparable in that respect.

Bounds checks from the clone_from_slice? Because those should be the same either way, I think. Otherwise I'm not sure.

The only other idea I have is to go with something closer to the original. I know it doesn't really take advantage of a lot of optimizations, but if both the matching approach and the iterator end up being considered more complicated than they're worth I think it would still be good to have access to clone_from implementations for the contained type.

Separately, the test in your main is much more concise than my test; it takes longer but I'm not sure how much that matters. Should I change my test to a similar nested loop (maybe just to 6 or 8 rather than 16)?

Yes “exhaustive” style tests are good, I haven't really checked that this test covers all cases, but to I think it's a good idea to test using that method.

W.r.t bounds checks, they are functionally equivalent but have different context in the code, there are fewer locations where they happen, where we are splitting in PairSlices, and once to check equal lengths in clone_from_slice. (With luck they could even compile out, based on the min calculation?).

Yes, and I wrote the test to be "exhaustive" but it's a lot more explicit, longer, and vulnerable to human error.

I see! Yes, that definitely seems easier for the compiler to work with.

Edit: Original is way slower, I made a mistake benchmarking. Taking advantage of contiguous memory gives you a huge boost, and I'm happy to change over the implementation if you give the go-ahead.

Yes, we can switch!

I've compared the two new impls for generated code, and the code in my playground link (+ reserve call which I added) seems to compile to simpler smaller code.

As a comparison, A has 13 calls to memcpy inside release compile of VecDeque::::clone_from and B has 3 calls to memcpy. A has 14 panic sites and B has 3 etc.

I haven't benchmarked or looked at compile time.

I changed everything over, tests work and benchmarks are at least as good.

Clone still outperforms clone_from when dst is much larger than src and T is copy, though; is there an argument against using the same optimization as vec's truncate for types that don't need to be dropped? If not, I would like to open a pull request for adding it after this one lands (since it might need testing but is a fairly small change). Something like this:

pub fn truncate(&mut self, len: usize) { let current_len = self.len(); if mem::needs_drop::<T>() { for _ in len..current_len { self.pop_back(); } } else if len < current_len { self.head = self.wrap_sub(self.head, current_len - len); } }

This change makes clone_from at least as fast as clone in all cases, and it just seems reasonable to take advantage of contiguous memory this way (unless there's some cost I'm not seeing)

Using either needs_drop or drop_in_place(*mut [T]) would be good in truncate, preferably the latter if possible. That's what Vec::truncate wants to change to using too, if possible, see #64432

I think drop_in_place would make sense and wouldn't be too hard. It seems like #64432 is going to get merged, and while VecDeque can't use exactly the same argument (clear uses drain for a VecDeque instead of truncate(0)), I think it makes sense to fix truncate and start using it in clear as well

bluss · 2019-10-09T18:29:42Z

src/liballoc/collections/vec_deque.rs

+        }
+
+        if iter.has_remainder() {
+            for remainder in iter.remainder() {


I forgot to put a reserve here before the extends. That said, with only 2 extend calls, it's not super necessary — if extend would reserve at all, but it doesn't(!). So extend is missing that first order optimization.

Does it make sense to change extend to use reserve (separately), or is it just standard practice to use reserve before extend? The only other thing is that conditionally reserving at the beginning of clone_from would prevent any work from being done in the OOM case, which could be a feature.

In general, though, it seems like reserving before extending makes sense since every iterator provides a lower bound through size_hint, some common cases are exact, and calling extend more than a few times should be rare.

Extend should in general issue the reserve itself, here it is known it will grow by at least the lower bound, so it should use that. (It can use other optimizations like TrustedLen later, like Vec does.)

src/liballoc/collections/vec_deque.rs

bluss · 2019-10-10T20:13:39Z

In a compile time stress test, PairSlices is a bit faster in compile time and produces smaller code the 6-branch clone_from_slice. So that looks good to me.

The stress test amounts to compiling 70 different monomorphizations of VecDeque::clone_from

bluss · 2019-10-10T20:15:08Z

Great! I think that's done. Can you rebase/squash together commits in the PR (if you want)? With that, this change is r=me

bluss · 2019-10-12T14:57:03Z

@bors r+ rollup

Thanks!

bors · 2019-10-12T14:57:04Z

📌 Commit d21eeb1 has been approved by bluss

Implement Clone::clone_from for VecDeque See rust-lang#28481. For simple data types with the target much longer than the source, this implementation can be significantly slower than the default (probably due to the use of truncate). However, it should be substantially faster when cloning from nested data structures with similar shapes or when cloning from VecDeques with similar lengths, hopefully more common use cases for clone_from.

@ghost

Rollup of 13 pull requests Successful merges: - #65039 (Document missing deny by default lints) - #65069 (Implement Clone::clone_from for VecDeque) - #65165 (Improve docs on some char boolean methods) - #65248 (Suggest `if let` on `let` refutable binding) - #65250 (resolve: fix error title regarding private constructors) - #65295 (Move diagnostics code out of the critical path) - #65320 (Report `CONST_ERR` lint in external macros) - #65327 (replace the hand-written binary search with the library one) - #65339 (do not reference LLVM for our concurrency memory model) - #65357 (syntax: simplify maybe_annotate_with_ascription) - #65358 (simplify maybe_stage_features) - #65359 (simplify integer_lit) - #65360 (mbe: reduce panictry! uses.) Failed merges: r? @ghost

Use ptr::drop_in_place for VecDeque::truncate and VecDeque::clear This commit allows `VecDeque::truncate` to take advantage of its (largely) contiguous memory layout and is consistent with the change in #64432 for `Vec`. As with the change to `Vec::truncate`, this changes both: - the drop order, from back-to-front to front-to-back - the behavior when dropping an element panics For consistency, it also changes the behavior when dropping an element panics for `VecDeque::clear`. These changes in behavior can be observed. This example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d0b1f2edc123437a2f704cbe8d93d828)) ```rust use std::collections::VecDeque; fn main() { struct Bomb(usize); impl Drop for Bomb { fn drop(&mut self) { panic!(format!("{}", self.0)); } } let mut v = VecDeque::from(vec![Bomb(0), Bomb(1)]); std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { v.truncate(0); })); std::mem::forget(v); } ``` panics printing `1` today and succeeds. `v.clear()` panics printing `0` today and succeeds. With the change, `v.clear()`, `v.truncate(0)`, and dropping the `VecDeque` all panic printing `0` first and then abort with a double-panic printing `1`. The motivation for this was making `VecDeque::truncate` more efficient since it was used in the implementation of `VecDeque::clone_from` (#65069), but it also makes behavior more consistent within the `VecDeque` and with `Vec` if that change is accepted (this probably doesn't make sense to merge if not). This might need a crater run and an FCP as well.

Match `VecDeque::extend` to `Vec::extend_desugared` Currently, `VecDeque::extend` [does not reserve at all](rust-lang#65069 (comment)). This implementation still runs a check every iteration of the loop, but should reallocate at most once for the common cases where the `size_hint` lower bound is exact. Further optimizations in the future could improve this for some common cases, but given the complexity of the `Vec::extend` implementation it's not immediately clear that this would be worthwhile.

rust-highfive assigned alexcrichton Oct 3, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2019

rust-highfive assigned bluss and unassigned alexcrichton Oct 3, 2019

scottmcm reviewed Oct 4, 2019

View reviewed changes

src/liballoc/collections/vec_deque.rs Show resolved Hide resolved

bluss reviewed Oct 6, 2019

View reviewed changes

bluss reviewed Oct 8, 2019

View reviewed changes

bluss reviewed Oct 9, 2019

View reviewed changes

src/liballoc/collections/vec_deque.rs Show resolved Hide resolved

crgl added 3 commits October 10, 2019 22:49

Implement Clone::clone_from for VecDeque

91cf02c

Add tests for VecDeque clone_from

10671f1

Override nth for VecDeque Iter and IterMut

d21eeb1

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 12, 2019

Centril mentioned this pull request Oct 13, 2019

Rollup of 13 pull requests #65368

Merged

bors merged commit d21eeb1 into rust-lang:master Oct 13, 2019

crgl deleted the clone-from-vec-deque branch October 13, 2019 20:29

crgl mentioned this pull request Oct 29, 2019

Use ptr::drop_in_place for VecDeque::truncate and VecDeque::clear #65933

Merged

crgl mentioned this pull request Nov 12, 2019

Match VecDeque::extend to Vec::extend_desugared #66341

Merged

Implement Clone::clone_from for VecDeque #65069

Implement Clone::clone_from for VecDeque #65069

Uh oh!

Conversation

crgl commented Oct 3, 2019

Uh oh!

rust-highfive commented Oct 3, 2019

Uh oh!

Centril commented Oct 3, 2019

Uh oh!

Uh oh!

crgl commented Oct 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluss Oct 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crgl Oct 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluss commented Oct 6, 2019

Uh oh!

alexcrichton commented Oct 7, 2019

Uh oh!

bluss Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crgl Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluss Oct 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bluss commented Oct 10, 2019

Uh oh!

bluss commented Oct 10, 2019

Uh oh!

bluss commented Oct 12, 2019

Uh oh!

bors commented Oct 12, 2019

Uh oh!

Uh oh!

bluss Oct 6, 2019 •

edited

Loading

crgl Oct 6, 2019 •

edited

Loading

bluss Oct 8, 2019 •

edited

Loading

crgl Oct 8, 2019 •

edited

Loading

bluss Oct 9, 2019 •

edited

Loading