Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Clone::clone_from for VecDeque #65069

Merged
merged 3 commits into from
Oct 13, 2019
Merged

Implement Clone::clone_from for VecDeque #65069

merged 3 commits into from
Oct 13, 2019

Conversation

crgl
Copy link
Contributor

@crgl crgl commented Oct 3, 2019

See #28481. For simple data types with the target much longer than the source, this implementation can be significantly slower than the default (probably due to the use of truncate). However, it should be substantially faster when cloning from nested data structures with similar shapes or when cloning from VecDeques with similar lengths, hopefully more common use cases for clone_from.

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2019
@Centril
Copy link
Contributor

Centril commented Oct 3, 2019

r? @bluss

@rust-highfive rust-highfive assigned bluss and unassigned alexcrichton Oct 3, 2019
@crgl
Copy link
Contributor Author

crgl commented Oct 5, 2019

Obviously this needs a better test now that it's taking more into account (I did of course make sure myself that it didn't panic in any of the cases, but I need to write a cleaner and more concise test for it to actually be in the library). I think this has some advantages: it's entirely safe, and it consistently outperforms the naive implementation. I do think truncate should be modified unless there's a compelling reason not to (at least, it seems like it should be able to take advantage of the mem::needs_drop::<T>() check the way Vec does), but I'm not sure that needs to be in this PR. Somewhat similarly, I think extend could be replaced by a more specialized implementation here, but my attempt at improvement went poorly. The only other thing I changed was adding overrides to the nth method because it seems like low-hanging performance fruit that affects this method. In the meantime, I'm happy to change any aspect of this, clean up commits, etc. Just let me know!

fn nth(&mut self, n: usize) -> Option<Self::Item> {
if n >= count(self.tail, self.head, self.ring.len()) {
self.tail = self.head;
self.next()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just put None here to simplify. What's the reason to go through next, does it have maintainability benefits (I don't really see it).

Copy link
Member

@bluss bluss Oct 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go through next - we can factor it out of the loop if, though.

Copy link
Contributor Author

@crgl crgl Oct 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think there's no reason not to use None. Is it okay if I make the change and squash it into the original nth commit, or is that bad form?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine. It's great to not squash in bigger updates, but now you've even mentioned what it is, so it's only more efficent to squash it in.

@bluss
Copy link
Member

bluss commented Oct 6, 2019

It looks good to me. Since it's a recent topic, how do we evaluate this in terms of code size (Rust and compiled code) and whether the improvements are worth the price (if there is a price)? @scottmcm and @alexcrichton

@alexcrichton
Copy link
Member

The price of clone_from implementations can generally be quantified with build time for std and how it's a bit more confusing to read/verify. (and also a hidden cost for possible bugs to come up in this)

We don't have a strict policy per se on whether we take implementations or not. It's largely up to the reviewer.

s_front.clone_from_slice(&o_front[..s_front_len]);
s_back[..overlap].clone_from_slice(&o_front[s_front_len..]);
s_back[overlap..].clone_from_slice(&o_back[..cap]);
},
Copy link
Member

@bluss bluss Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we have to stem the bloat of branches here, to have just 1 branch, not 6.

Sorry for writing so much code, I ended up prototyping that like this - https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6ff3024edcb82d8f3c605963f0ba9dbf

Using an iterator to pair parts of the two slice tuples together.

It's not necessarily a better approach. I'd like to test compile both and look at the compile time/code size. The code in the PR would have a huge code size if the clone_from_slice calls were inlined, so that's the main thing I'd be afraid of. (Also the potential for the bounds checks to add up?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that iterator. There is a private helper struct for iteration in btree as well, so it's not particularly bizarre (I just checked because I hadn't considered it and it's very satisfying). I'm not sure about the compile time/code size differences, but I do think the whole approach is better if it's comparable in that respect.

Bounds checks from the clone_from_slice? Because those should be the same either way, I think. Otherwise I'm not sure.

The only other idea I have is to go with something closer to the original. I know it doesn't really take advantage of a lot of optimizations, but if both the matching approach and the iterator end up being considered more complicated than they're worth I think it would still be good to have access to clone_from implementations for the contained type.

Separately, the test in your main is much more concise than my test; it takes longer but I'm not sure how much that matters. Should I change my test to a similar nested loop (maybe just to 6 or 8 rather than 16)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes “exhaustive” style tests are good, I haven't really checked that this test covers all cases, but to I think it's a good idea to test using that method.

W.r.t bounds checks, they are functionally equivalent but have different context in the code, there are fewer locations where they happen, where we are splitting in PairSlices, and once to check equal lengths in clone_from_slice. (With luck they could even compile out, based on the min calculation?).

Copy link
Contributor Author

@crgl crgl Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I wrote the test to be "exhaustive" but it's a lot more explicit, longer, and vulnerable to human error.

I see! Yes, that definitely seems easier for the compiler to work with.

Edit: Original is way slower, I made a mistake benchmarking. Taking advantage of contiguous memory gives you a huge boost, and I'm happy to change over the implementation if you give the go-ahead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can switch!

I've compared the two new impls for generated code, and the code in my playground link (+ reserve call which I added) seems to compile to simpler smaller code.

As a comparison, A has 13 calls to memcpy inside release compile of VecDeque::::clone_from and B has 3 calls to memcpy. A has 14 panic sites and B has 3 etc.

I haven't benchmarked or looked at compile time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed everything over, tests work and benchmarks are at least as good.

Clone still outperforms clone_from when dst is much larger than src and T is copy, though; is there an argument against using the same optimization as vec's truncate for types that don't need to be dropped? If not, I would like to open a pull request for adding it after this one lands (since it might need testing but is a fairly small change). Something like this:

pub fn truncate(&mut self, len: usize) {
    let current_len = self.len();
    if mem::needs_drop::<T>() {
        for _ in len..current_len {
            self.pop_back();
        }
    } else if len < current_len {
        self.head = self.wrap_sub(self.head, current_len - len);
    }
}

This change makes clone_from at least as fast as clone in all cases, and it just seems reasonable to take advantage of contiguous memory this way (unless there's some cost I'm not seeing)

Copy link
Member

@bluss bluss Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using either needs_drop or drop_in_place(*mut [T]) would be good in truncate, preferably the latter if possible. That's what Vec::truncate wants to change to using too, if possible, see #64432

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think drop_in_place would make sense and wouldn't be too hard. It seems like #64432 is going to get merged, and while VecDeque can't use exactly the same argument (clear uses drain for a VecDeque instead of truncate(0)), I think it makes sense to fix truncate and start using it in clear as well

}

if iter.has_remainder() {
for remainder in iter.remainder() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to put a reserve here before the extends. That said, with only 2 extend calls, it's not super necessary — if extend would reserve at all, but it doesn't(!). So extend is missing that first order optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to change extend to use reserve (separately), or is it just standard practice to use reserve before extend? The only other thing is that conditionally reserving at the beginning of clone_from would prevent any work from being done in the OOM case, which could be a feature.

In general, though, it seems like reserving before extending makes sense since every iterator provides a lower bound through size_hint, some common cases are exact, and calling extend more than a few times should be rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend should in general issue the reserve itself, here it is known it will grow by at least the lower bound, so it should use that. (It can use other optimizations like TrustedLen later, like Vec does.)

@bluss
Copy link
Member

bluss commented Oct 10, 2019

In a compile time stress test, PairSlices is a bit faster in compile time and produces smaller code the 6-branch clone_from_slice. So that looks good to me.

The stress test amounts to compiling 70 different monomorphizations of VecDeque::clone_from

@bluss
Copy link
Member

bluss commented Oct 10, 2019

Great! I think that's done. Can you rebase/squash together commits in the PR (if you want)? With that, this change is r=me

@bluss
Copy link
Member

bluss commented Oct 12, 2019

@bors r+ rollup

Thanks!

@bors
Copy link
Contributor

bors commented Oct 12, 2019

📌 Commit d21eeb1 has been approved by bluss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 12, 2019
Centril added a commit to Centril/rust that referenced this pull request Oct 13, 2019
Implement Clone::clone_from for VecDeque

See rust-lang#28481. For simple data types with the target much longer than the source, this implementation can be significantly slower than the default (probably due to the use of truncate). However, it should be substantially faster when cloning from nested data structures with similar shapes or when cloning from VecDeques with similar lengths, hopefully more common use cases for clone_from.
bors added a commit that referenced this pull request Oct 13, 2019
Rollup of 13 pull requests

Successful merges:

 - #65039 (Document missing deny by default lints)
 - #65069 (Implement Clone::clone_from for VecDeque)
 - #65165 (Improve docs on some char boolean methods)
 - #65248 (Suggest `if let` on `let` refutable binding)
 - #65250 (resolve: fix error title regarding private constructors)
 - #65295 (Move diagnostics code out of the critical path)
 - #65320 (Report `CONST_ERR` lint in external macros)
 - #65327 (replace the hand-written binary search with the library one)
 - #65339 (do not reference LLVM for our concurrency memory model)
 - #65357 (syntax: simplify maybe_annotate_with_ascription)
 - #65358 (simplify maybe_stage_features)
 - #65359 (simplify integer_lit)
 - #65360 (mbe: reduce panictry! uses.)

Failed merges:

r? @ghost
@bors bors merged commit d21eeb1 into rust-lang:master Oct 13, 2019
@crgl crgl deleted the clone-from-vec-deque branch October 13, 2019 20:29
bors added a commit that referenced this pull request Nov 11, 2019
Use ptr::drop_in_place for VecDeque::truncate and VecDeque::clear

This commit allows `VecDeque::truncate` to take advantage of its (largely) contiguous memory layout and is consistent with the change in #64432 for `Vec`. As with the change to `Vec::truncate`, this changes both:

- the drop order, from back-to-front to front-to-back
- the behavior when dropping an element panics

For consistency, it also changes the behavior when dropping an element panics for `VecDeque::clear`.

These changes in behavior can be observed. This example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d0b1f2edc123437a2f704cbe8d93d828))
```rust
use std::collections::VecDeque;

fn main() {
    struct Bomb(usize);
    impl Drop for Bomb {
        fn drop(&mut self) {
            panic!(format!("{}", self.0));
        }
    }
    let mut v = VecDeque::from(vec![Bomb(0), Bomb(1)]);
    std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
        v.truncate(0);
    }));
    std::mem::forget(v);
}
```
panics printing `1` today and succeeds. `v.clear()` panics printing `0` today and succeeds. With the change, `v.clear()`, `v.truncate(0)`, and dropping the `VecDeque` all panic printing `0` first and then abort with a double-panic printing `1`.

The motivation for this was making `VecDeque::truncate` more efficient since it was used in the implementation of `VecDeque::clone_from` (#65069), but it also makes behavior more consistent within the `VecDeque` and with `Vec` if that change is accepted (this probably doesn't make sense to merge if not).

This might need a crater run and an FCP as well.
Centril added a commit to Centril/rust that referenced this pull request Dec 13, 2019
Match `VecDeque::extend` to `Vec::extend_desugared`

Currently, `VecDeque::extend` [does not reserve at all](rust-lang#65069 (comment)). This implementation still runs a check every iteration of the loop, but should reallocate at most once for the common cases where the `size_hint` lower bound is exact. Further optimizations in the future could improve this for some common cases, but given the complexity of the `Vec::extend` implementation it's not immediately clear that this would be worthwhile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants