-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Clone::clone_from for VecDeque #65069
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
r? @bluss |
Obviously this needs a better test now that it's taking more into account (I did of course make sure myself that it didn't panic in any of the cases, but I need to write a cleaner and more concise test for it to actually be in the library). I think this has some advantages: it's entirely safe, and it consistently outperforms the naive implementation. I do think truncate should be modified unless there's a compelling reason not to (at least, it seems like it should be able to take advantage of the |
fn nth(&mut self, n: usize) -> Option<Self::Item> { | ||
if n >= count(self.tail, self.head, self.ring.len()) { | ||
self.tail = self.head; | ||
self.next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just put None
here to simplify. What's the reason to go through next
, does it have maintainability benefits (I don't really see it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go through next - we can factor it out of the loop if
, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think there's no reason not to use None
. Is it okay if I make the change and squash it into the original nth
commit, or is that bad form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine. It's great to not squash in bigger updates, but now you've even mentioned what it is, so it's only more efficent to squash it in.
It looks good to me. Since it's a recent topic, how do we evaluate this in terms of code size (Rust and compiled code) and whether the improvements are worth the price (if there is a price)? @scottmcm and @alexcrichton |
The price of We don't have a strict policy per se on whether we take implementations or not. It's largely up to the reviewer. |
s_front.clone_from_slice(&o_front[..s_front_len]); | ||
s_back[..overlap].clone_from_slice(&o_front[s_front_len..]); | ||
s_back[overlap..].clone_from_slice(&o_back[..cap]); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we have to stem the bloat of branches here, to have just 1 branch, not 6.
Sorry for writing so much code, I ended up prototyping that like this - https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6ff3024edcb82d8f3c605963f0ba9dbf
Using an iterator to pair parts of the two slice tuples together.
It's not necessarily a better approach. I'd like to test compile both and look at the compile time/code size. The code in the PR would have a huge code size if the clone_from_slice
calls were inlined, so that's the main thing I'd be afraid of. (Also the potential for the bounds checks to add up?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that iterator. There is a private helper struct for iteration in btree as well, so it's not particularly bizarre (I just checked because I hadn't considered it and it's very satisfying). I'm not sure about the compile time/code size differences, but I do think the whole approach is better if it's comparable in that respect.
Bounds checks from the clone_from_slice? Because those should be the same either way, I think. Otherwise I'm not sure.
The only other idea I have is to go with something closer to the original. I know it doesn't really take advantage of a lot of optimizations, but if both the matching approach and the iterator end up being considered more complicated than they're worth I think it would still be good to have access to clone_from implementations for the contained type.
Separately, the test in your main is much more concise than my test; it takes longer but I'm not sure how much that matters. Should I change my test to a similar nested loop (maybe just to 6 or 8 rather than 16)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes “exhaustive” style tests are good, I haven't really checked that this test covers all cases, but to I think it's a good idea to test using that method.
W.r.t bounds checks, they are functionally equivalent but have different context in the code, there are fewer locations where they happen, where we are splitting in PairSlices, and once to check equal lengths in clone_from_slice
. (With luck they could even compile out, based on the min
calculation?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I wrote the test to be "exhaustive" but it's a lot more explicit, longer, and vulnerable to human error.
I see! Yes, that definitely seems easier for the compiler to work with.
Edit: Original is way slower, I made a mistake benchmarking. Taking advantage of contiguous memory gives you a huge boost, and I'm happy to change over the implementation if you give the go-ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can switch!
I've compared the two new impls for generated code, and the code in my playground link (+ reserve call which I added) seems to compile to simpler smaller code.
As a comparison, A has 13 calls to memcpy inside release compile of VecDeque::::clone_from and B has 3 calls to memcpy. A has 14 panic sites and B has 3 etc.
I haven't benchmarked or looked at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed everything over, tests work and benchmarks are at least as good.
Clone still outperforms clone_from when dst is much larger than src and T is copy, though; is there an argument against using the same optimization as vec's truncate for types that don't need to be dropped? If not, I would like to open a pull request for adding it after this one lands (since it might need testing but is a fairly small change). Something like this:
pub fn truncate(&mut self, len: usize) {
let current_len = self.len();
if mem::needs_drop::<T>() {
for _ in len..current_len {
self.pop_back();
}
} else if len < current_len {
self.head = self.wrap_sub(self.head, current_len - len);
}
}
This change makes clone_from at least as fast as clone in all cases, and it just seems reasonable to take advantage of contiguous memory this way (unless there's some cost I'm not seeing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using either needs_drop
or drop_in_place(*mut [T])
would be good in truncate, preferably the latter if possible. That's what Vec::truncate wants to change to using too, if possible, see #64432
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think drop_in_place
would make sense and wouldn't be too hard. It seems like #64432 is going to get merged, and while VecDeque can't use exactly the same argument (clear
uses drain
for a VecDeque instead of truncate(0)
), I think it makes sense to fix truncate and start using it in clear
as well
} | ||
|
||
if iter.has_remainder() { | ||
for remainder in iter.remainder() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to put a reserve here before the extends. That said, with only 2 extend calls, it's not super necessary — if extend
would reserve at all, but it doesn't(!). So extend is missing that first order optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to change extend to use reserve (separately), or is it just standard practice to use reserve before extend? The only other thing is that conditionally reserving at the beginning of clone_from would prevent any work from being done in the OOM case, which could be a feature.
In general, though, it seems like reserving before extending makes sense since every iterator provides a lower bound through size_hint, some common cases are exact, and calling extend
more than a few times should be rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend should in general issue the reserve itself, here it is known it will grow by at least the lower bound, so it should use that. (It can use other optimizations like TrustedLen later, like Vec does.)
In a compile time stress test, PairSlices is a bit faster in compile time and produces smaller code the 6-branch The stress test amounts to compiling 70 different monomorphizations of |
Great! I think that's done. Can you rebase/squash together commits in the PR (if you want)? With that, this change is r=me |
@bors r+ rollup Thanks! |
📌 Commit d21eeb1 has been approved by |
Implement Clone::clone_from for VecDeque See rust-lang#28481. For simple data types with the target much longer than the source, this implementation can be significantly slower than the default (probably due to the use of truncate). However, it should be substantially faster when cloning from nested data structures with similar shapes or when cloning from VecDeques with similar lengths, hopefully more common use cases for clone_from.
Rollup of 13 pull requests Successful merges: - #65039 (Document missing deny by default lints) - #65069 (Implement Clone::clone_from for VecDeque) - #65165 (Improve docs on some char boolean methods) - #65248 (Suggest `if let` on `let` refutable binding) - #65250 (resolve: fix error title regarding private constructors) - #65295 (Move diagnostics code out of the critical path) - #65320 (Report `CONST_ERR` lint in external macros) - #65327 (replace the hand-written binary search with the library one) - #65339 (do not reference LLVM for our concurrency memory model) - #65357 (syntax: simplify maybe_annotate_with_ascription) - #65358 (simplify maybe_stage_features) - #65359 (simplify integer_lit) - #65360 (mbe: reduce panictry! uses.) Failed merges: r? @ghost
Use ptr::drop_in_place for VecDeque::truncate and VecDeque::clear This commit allows `VecDeque::truncate` to take advantage of its (largely) contiguous memory layout and is consistent with the change in #64432 for `Vec`. As with the change to `Vec::truncate`, this changes both: - the drop order, from back-to-front to front-to-back - the behavior when dropping an element panics For consistency, it also changes the behavior when dropping an element panics for `VecDeque::clear`. These changes in behavior can be observed. This example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d0b1f2edc123437a2f704cbe8d93d828)) ```rust use std::collections::VecDeque; fn main() { struct Bomb(usize); impl Drop for Bomb { fn drop(&mut self) { panic!(format!("{}", self.0)); } } let mut v = VecDeque::from(vec![Bomb(0), Bomb(1)]); std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { v.truncate(0); })); std::mem::forget(v); } ``` panics printing `1` today and succeeds. `v.clear()` panics printing `0` today and succeeds. With the change, `v.clear()`, `v.truncate(0)`, and dropping the `VecDeque` all panic printing `0` first and then abort with a double-panic printing `1`. The motivation for this was making `VecDeque::truncate` more efficient since it was used in the implementation of `VecDeque::clone_from` (#65069), but it also makes behavior more consistent within the `VecDeque` and with `Vec` if that change is accepted (this probably doesn't make sense to merge if not). This might need a crater run and an FCP as well.
Match `VecDeque::extend` to `Vec::extend_desugared` Currently, `VecDeque::extend` [does not reserve at all](rust-lang#65069 (comment)). This implementation still runs a check every iteration of the loop, but should reallocate at most once for the common cases where the `size_hint` lower bound is exact. Further optimizations in the future could improve this for some common cases, but given the complexity of the `Vec::extend` implementation it's not immediately clear that this would be worthwhile.
See #28481. For simple data types with the target much longer than the source, this implementation can be significantly slower than the default (probably due to the use of truncate). However, it should be substantially faster when cloning from nested data structures with similar shapes or when cloning from VecDeques with similar lengths, hopefully more common use cases for clone_from.