diff --git a/content/issue-1/vecdeque-resize-optimization.md b/content/issue-1/vecdeque-resize-optimization.md new file mode 100644 index 0000000..d38b2d9 --- /dev/null +++ b/content/issue-1/vecdeque-resize-optimization.md @@ -0,0 +1,268 @@ +> This is the first article on **pr-demystifying** topics. Each article labeled **pr-demystifying** will attempt to demystify the details behind the PR. + +While browsing the rust-lang repository's list of pull requests last month, I came across PR [#104435], titled `VecDeque::resize should re-use the buffer in the passed-in element`. This PR caught my attention because it seemed interesting and I wanted to understand more about it. I began to wonder why it was necessary to optimize `VecDeque::resize()` and how the old version might be lacking. I also wanted to know how the author had optimized the new version. After delving into the code in the PR, I was able to gain a deeper understanding of these issues. + +## VecDeque::resize() + +Firstly, let's get familiar with the [VecDeque::resize()](https://doc.rust-lang.org/std/collections/struct.VecDeque.html#method.resize). + +> `pub fn resize(&mut self, new_len: usize, value: T)` +> +> Modifies the deque in-place so that `len()` is equal to `new_len`, either by removing excess elements from the back or by appending clones of `value` to the back. + +```rust +use std::collections::VecDeque; + +let mut buf = VecDeque::new(); +buf.push_back(5); +buf.push_back(10); +buf.push_back(15); +assert_eq!(buf, [5, 10, 15]); + +buf.resize(2, 0); +assert_eq!(buf, [5, 10]); + +buf.resize(5, 20); +assert_eq!(buf, [5, 10, 20, 20, 20]); +``` + +The `VecDeque::resize()` API is simple to use. It takes two arguments: the new length to which the `VecDeque` should be resized, and a value to use for any new elements that are added to the `VecDeque` when it expands. + +## The problem + +However, if we don't look at the implementation details of this function, we might not realize that there is room for optimization. As the PR's author [@scottmcm] pointed out, the old version did not reuse the value that was passed in as the default, resulting in unnecessary cloning of values. + +```rust +use std::collections::VecDeque; + +let mut buf = VecDeque::new(); +buf.resize(5, String::from("rust")); +``` + + For example, the string "rust" was cloned five times, even though only four were needed, because the `VecDeque::resize()` function used `VecDeque::resize_with()` underneath, which passed a closure to the `repeat_with().take()`. + +```rust +pub fn resize(&mut self, new_len: usize, value: T) { + self.resize_with(new_len, || value.clone()); +} + +pub fn resize_with(&mut self, new_len: usize, generator: impl FnMut() -> T) { + let len = self.len(); + + if new_len > len { + self.extend(repeat_with(generator).take(new_len - len)) + } else { + self.truncate(new_len); + } +} +``` + +This closure was called repeatedly until it reached the `take` limit, causing unnecessary cloning. + +```rust +pub fn repeat_with A>(repeater: F) -> RepeatWith { + RepeatWith { repeater } +} + +/// An iterator that repeats elements of type `A` endlessly by +/// applying the provided closure `F: FnMut() -> A`. +#[derive(Copy, Clone, Debug)] +pub struct RepeatWith { + repeater: F, +} + +impl A> Iterator for RepeatWith { + type Item = A; + + #[inline] + fn next(&mut self) -> Option { + Some((self.repeater)()) + } + + #[inline] + fn size_hint(&self) -> (usize, Option) { + (usize::MAX, None) + } +} +``` + +Now that we have identified the problem, let's move on to how the author fixed it. + +## iter::repeat_n + +The most significant change made in PR [#104435] was the replacement of `repeat_with().take()` with `repeat_n()`. + +```diff +pub fn resize(&mut self, new_len: usize, value: T) { + if new_len > self.len() { + let extra = new_len - self.len(); +- self.extend(repeat_with(generator).take(new_len - len)) ++ self.extend(repeat_n(value, extra)) + } else { + self.truncate(new_len); + } +} +``` + +We can learn more about `repeat_n` from [ACP: Uplift iter::repeat_n from itertools](https://github.com/rust-lang/libs-team/issues/120). The author proposes to uplift [itertools::repeat_n()](https://docs.rs/itertools/latest/itertools/fn.repeat_n.html) into the standard library, just like how [iter::repeat_with()]() has obviated [itertools::repeat_call()](https://docs.rs/itertools/latest/itertools/fn.repeat_call.html). + +How does `repeat_n()` avoid the unnecessary cloning? Let’s dive into the code: + +```rust +pub fn repeat_n(element: T, count: usize) -> RepeatN { + let mut element = ManuallyDrop::new(element); + + if count == 0 { + // SAFETY: we definitely haven't dropped it yet, since we only just got + // passed it in, and because the count is zero the instance we're about + // to create won't drop it, so to avoid leaking we need to now. + unsafe { ManuallyDrop::drop(&mut element) }; + } + + RepeatN { element, count } +} + +pub struct RepeatN { + count: usize, + // Invariant: has been dropped iff count == 0. + element: ManuallyDrop, +} + +impl Iterator for RepeatN { + type Item = A; + + #[inline] + fn next(&mut self) -> Option { + if self.count == 0 { + return None; + } + + self.count -= 1; + Some(if self.count == 0 { + // SAFETY: the check above ensured that the count used to be non-zero, + // so element hasn't been dropped yet, and we just lowered the count to + // zero so it won't be dropped later, and thus it's okay to take it here. + unsafe { ManuallyDrop::take(&mut self.element) } + } else { + A::clone(&mut self.element) + }) + } +} +``` + +Not too much code, we can easily grasp it. The `RepeatN` struct returned by `repeat_n()` is the point. To save a cloning, `RepeatN` declares its `element` as the `ManuallyDrop` type. + +## ManuallyDrop\ + +[ManuallyDrop\](https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html) is a zero-cost wrapper to inhibit compiler from automatically calling **T**’s destructor unless you call `ManuallyDrop::drop()` method. + +The two most important APIs that can help us avoid unnecessary cloning are: + +- `ManuallyDrop::take()` + +```rust +pub unsafe fn take(slot: &mut ManuallyDrop) -> T { + // SAFETY: we are reading from a reference, which is guaranteed + // to be valid for reads. + unsafe { ptr::read(&slot.value) } +} +``` + +- The `DerefMut` implementation + +In the `Iterator` implementation of `RepeatN`, the `next()` method clones the element in each iteration until the `count` reaches 0. In the final iteration, the `ManuallyDrop::take()` function is used to reuse the buffer and avoid an extra clone. + +Wait, but why does `A::clone(&mut self.element)` will get an instance of `A`? The type of `&mut self.element` is `&mut ManuallyDrop`, not `&mut A`. Well, the use of `ManuallyDrop` may seem obscure at first, but it becomes clearer when we consider the `Deref` and `DerefMut` traits that it implements. These traits allow for a type like `&mut ManuallyDrop` to be treated as if it were a type like `&mut A`. This is known as [Deref coercion](https://doc.rust-lang.org/std/ops/trait.DerefMut.html#more-on-deref-coercion). As an example, consider the following code: + +```rust +fn main() { + let mut a = String::from("A"); + test(&mut a); +} + +fn test(s: &str) { + println!("{s}"); +} +``` + +Here, we are able to pass a `&mut String` value to the `test()` function, even though the function's argument type is `&str`. This is because `String` implements both [Deref](https://doc.rust-lang.org/src/alloc/string.rs.html#2455) and [DerefMut](https://doc.rust-lang.org/src/alloc/string.rs.html#2465), allowing it to be treated as if it were a `&str` value. + +Similarly, `ManuallyDrop` also implements `DerefMut`, allowing it to be treated as if it were an `&mut A` value. + +```rust +impl const DerefMut for ManuallyDrop { + #[inline(always)] + fn deref_mut(&mut self) -> &mut T { + &mut self.value + } +} +``` + +It's important to note that the `Deref` trait also has a implementation for `&mut T`: + +```rust +impl const Deref for &mut T { + type Target = T; + + fn deref(&self) -> &T { + *self + } +} +``` + +So `A::clone(&mut self.element)` works because `&mut ManuallyDrop` can convert to `&mut A` due to **Deref coercion**, then `&mut A` can convert to `&A` also due to **Deref coercion**. + +> Thanks to [@scottmcm]'s kindful reminder, we can change `A::clone(&mut self.element)` to `A::clone(&self.element)`. So I submitted a PR([#106564]) to fix it. + +## Alternative? + +During the review of this article, [@XeCycle] suggested that it might be possible to achieve the same result without using *unsafe* code by using `Option<(NonZeroUsize, T)>` instead. [@XeCycle] provided a [playground example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fd82d81d0f485acd769c3b0aaeeb5a3b) to demonstrate this idea. + +However, [@scottmcm] also commented on this issue, explaining that `ManuallyDrop` is a more appropriate choice for this case because it is designed for *niche-filling*. As an example, consider the following code: + +```rust +use core::mem::MaybeUninit; +use core::mem::ManuallyDrop; +use core::num::NonZeroUsize; +use core::alloc::Layout; + +fn main() { + dbg!(Layout::new::>>()); + dbg!(Layout::new::)>>()); + dbg!(Layout::new::)>>()); +} +``` + +The output of this code is: + +``` +[src/main.rs:6] Layout::new::>>() = Layout { + size: 40, + align: 8 (1 << 3), +} +[src/main.rs:7] Layout::new::)>>() = Layout { + size: 40, + align: 8 (1 << 3), +} +[src/main.rs:8] Layout::new::)>>() = Layout { + size: 32, + align: 8 (1 << 3), +} +``` + +As we can see, using `Option<(usize, ManuallyDrop)>` results in a smaller size than the other options. Using `ManuallyDrop` allows `Option>` to have the same size as `RepeatN`, which is not the case with the other options. + +> For more information on this topic, you can check out the article PR [#104435]. + +## Conclusion + +As a Rust developer, I am often most concerned with the changes listed in the stable [release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md). However, this does not mean that I should not be interested in the individual pull requests (PRs) that are being merged into the project. There are hundreds of PRs merged each week, and each one has a story and an author behind it. That's why I propose the creation of a topic called [#pr-demystifying], where we can share articles about interesting or educational PRs in the Rust community. The PR [#104435], for example, may not be a major optimization, but it allowed me to learn a lot. I would like to thank the author [@scottmcm] for their work on this PR. I hope that this article and others like it will be helpful to others in the community. + +> Thanks to [@scottmcm], [@XeCycle], [@zhanghandong] for proof reading this post! + +[#pr-demystifying]: /topic/pr-demystifying +[#104435]: https://github.com/rust-lang/rust/pull/104435 +[#106564]: https://github.com/rust-lang/rust/pull/106564 +[@scottmcm]: https://github.com/scottmcm +[@XeCycle]: https://github.com/XeCycle +[@zhanghandong]: https://github.com/zhanghandong \ No newline at end of file diff --git a/content/issue-1/zine.toml b/content/issue-1/zine.toml index 3a04b6c..a77b3d0 100644 --- a/content/issue-1/zine.toml +++ b/content/issue-1/zine.toml @@ -12,3 +12,12 @@ pub_date = "2022-12-10" publish = true featured = true i18n.zh = { path = "/announcing-zh", file = "announcing-zh.md", title = "宣布 Rust 杂志", author = ["folyd", "rust-magazine"] } + +[[article]] +file = "vecdeque-resize-optimization.md" +title = "VecDeque::resize() optimization" +author = "folyd" +topic = ["pr-demystifying", "optimization"] +pub_date = "2022-12-24" +publish = false +featured = true \ No newline at end of file diff --git a/zine.toml b/zine.toml index e038308..0a816f1 100644 --- a/zine.toml +++ b/zine.toml @@ -37,3 +37,5 @@ highlight_theme = "ayu-light" [topics] announcement = {} game = { name = "Game development", description = "Topic of rust game development." } +pr-demystifying = { name = "PR Demystifying", description = "Topic of PR demystifying in Rust community."} +optimization = {}