Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VecDeque implementation #99805

Closed
joboet opened this issue Jul 27, 2022 · 25 comments
Closed

Improve VecDeque implementation #99805

joboet opened this issue Jul 27, 2022 · 25 comments
Labels
T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@joboet
Copy link
Member

joboet commented Jul 27, 2022

VecDeque currently allocates one extra empty element because it needs to discern between an empty and full queue. Besides wasting memory, this means VecDeque::new cannot currently be const.

Solution

The most elegant solution would be to reimplement VecDeque based off the (third) technique described by Juho Snellman in this blog post. In this implementation, the buffer indexes are not clamped to the buffer size, but instead use the whole range of usize. Only when accessing an element are they masked to fit. The length of the buffer is defined by the wrapping distance between the two indexes. By limiting the capacity to values smaller than isize::MAX (which Rust's memory model dictates anyway), an empty queue (with head == tail) is strictly different from a full queue (where head - tail == capacity).

In the described implementation, the capacity is always be a power of two so that wrapping arithmetic can be used. This is great for performance and simplifies the implementation, but may result in significant unused memory when a large but precise capacity is required. Therefore, Eli Dupree proposed a variation where the indexes are kept smaller than 2 * capacity using modular arithmetic, which would relax the above requirement but incur higher runtime cost since the more expensive modular arithmetic needs to be used.

Links and related work

@rustbot label +T-libs +T-libs-api

@rustbot rustbot added T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jul 27, 2022
@5225225
Copy link
Contributor

5225225 commented Jul 27, 2022

By limiting the capacity to values smaller than isize::MAX (which Rust's memory model dictates anyway), an empty queue (with head == tail) is strictly different from a full queue (where head - tail == capacity).

Note: That limit is for isize::MAX bytes. You can handle an array of usize::MAX ZSTs just fine. And indeed, Vec handles it fine.

But it seems like VecDeque::<()>::with_capacity(isize::MAX + 1) fails already, so that's not a breaking change.

@joboet
Copy link
Member Author

joboet commented Jul 27, 2022

By limiting the capacity to values smaller than isize::MAX (which Rust's memory model dictates anyway), an empty queue (with head == tail) is strictly different from a full queue (where head - tail == capacity).

Note: That limit is for isize::MAX bytes. You can handle an array of usize::MAX ZSTs just fine. And indeed, Vec handles it fine.

But it seems like VecDeque::<()>::with_capacity(isize::MAX + 1) fails already, so that's not a breaking change.

Since the size is known at compile, ZSTs can be checked for without any performance hit. Also, since their address and order is irrelevant, it should be possible to use just one of the indices as counter for the number of elements, which should make operations faster.

@leonardo-m
Copy link

Let's keep in mind the sequence of implementation bugs that have being found in VecDeque. It's one of those situations where having a formal proof of code is useful.

@workingjubilee
Copy link
Member

VecDeque currently allocates one extra empty element because it needs to discern between an empty and full queue. Besides wasting memory, this means VecDeque::new cannot currently be const.

It is intended that const capabilities expand to allow expressing this. Redesigning VecDeque around this limitation, if it would affect correctness, doesn't seem necessary.

@joboet
Copy link
Member Author

joboet commented Jul 27, 2022

VecDeque currently allocates one extra empty element because it needs to discern between an empty and full queue. Besides wasting memory, this means VecDeque::new cannot currently be const.

It is intended that const capabilities expand to allow expressing this. Redesigning VecDeque around this limitation, if it would affect correctness, doesn't seem necessary.

The implementation in the blog post does not have any known errors, so correctness is not affected. On the other hand, even if VecDeque::new could be made const with the current design, the issue of over-allocating would still persist. Additionally, with the proposed variation, conversions from Vec would be essentially free.

Let's keep in mind the sequence of implementation bugs that have being found in VecDeque. It's one of those situations where having a formal proof of code is useful.

Definitely, but I do not think that this should block the usage of an objectively better implementation.

@the8472
Copy link
Member

the8472 commented Jul 27, 2022

Therefore, Eli Dupree proposed a variation where the indexes are kept smaller than 2 * capacity using modular arithmetic, which would relax the above requirement but incur higher runtime cost since the more expensive modular arithmetic needs to be used.

We could even check if the capacity is a power of two and then do masking and otherwise do modular arithmetic.

But imo that should be done separately from allowing non-allocating VecDeque::new since that should be much simpler.

@mqudsi
Copy link
Contributor

mqudsi commented Jul 29, 2022

Just a point: the solution proposed by Eli Dupree was actually originally proposed by dizzy57 in the comments on the original jsnell article on 2016-12-14.

@the8472 I don't think it would make sense to check the capacity and then use one of two methods depending on if it's a power-of-two or not: that's additional branches, more code complexity, etc all to avoid some integer math.

@the8472
Copy link
Member

the8472 commented Jul 29, 2022

I would hope those branches branches are well-predicted and cheaper than divisions.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

I wrote this reddit post yesterday, and got some replies saying I should maybe comment on here so here goes nothing:
I've written a Deque implementation which should be basically a drop in replacement for alloc::collections::VecDeque, which allows for empty buffers (so Deque::new can be const fn, and Deque::from(Vec) is a free conversion), and also allows for non-power of 2 capacities, which means Deque::shrink_to_fit can actually shrink to fit.

I've done some differential fuzzing, comparing my implementation against the std VecDeque, and I'm reasonably certain my implementation is correct.

I've also run the VecDeque benchmarks on my Deque (courtesy of https://www.reddit.com/user/jrf63/ who ported the benchmarks), and it seems most operations are similar in speed, if not a bit faster for my Deque. One exception is the extend_chained_trustedlen benchmark, where for whatever reason, VecDeque is almost 10x faster than my Deque, despite their Extend-impls being very similar. I have yet to figure out where that slowdown comes from.

Other than that, is there any chance an implementation similar to / based on my Deque could land in alloc? On reddit, there seemed to be a few people who were quite excited about a possible free Vec -> Deque conversion in std.

@the8472
Copy link
Member

the8472 commented Oct 12, 2022

VecDeque documentation doesn't promise that it'll size its capacity to a power-of-two so that kind of rewrite should be ok in principle. Other collection impls have been replaced piecemeal or wholesale before (e.g. HashMap). And free conversions are enticing, even if VecDeque is a bit more niche than Vec itself.

Since it'll likely touch a lot of code, including some unsafe code, you should be prepared for a long review time though and try to keep the diff as small as reasonable (without doing contortions).

Does your deque also work for ZSTs?

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

It does work for ZSTs, just like Vec it doesn't allocate for ZSTs, its capacity is always usize::MAX and some of the functions that copy stuff around (make_contiguous, rotate_{right,left} and so on) also skip the copying for ZSTs to minimize the required work.

@the8472
Copy link
Member

the8472 commented Oct 12, 2022

One exception is the extend_chained_trustedlen benchmark

That's due to a specialization that eliminates repeated reallocation and bounds-checks.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

And as for the diff size, I'm not sure a lot of the VecDeque code is salvageable, so I would've preferred to just replace it outright, as most of the algorithms and functions would need to be replaced anyways.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

One exception is the extend_chained_trustedlen benchmark

That's due to a specialization that eliminates repeated reallocation and bounds-checks.

I'm aware of the specialization and have implemented it in a very similar manner myself.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

If I do change VecDeque::new to be const, should I add a #[rustc_const_unstable] attribute? If so, whats the process for adding a feature for it? I never worked on the stdlib itself before, so I'm not very knowledgeable on how to do things like that.

@joboet
Copy link
Member Author

joboet commented Oct 12, 2022

If I do change VecDeque::new to be const, should I add a #[rustc_const_unstable] attribute? If so, whats the process for adding a feature for it? I never worked on the stdlib itself before, so I'm not very knowledgeable on how to do things like that.

Yes, that is the correct attribute. I would recommend checking out the std-dev-guide, it might answer a few questions 😉 (I only discovered it a few days ago, and it's pretty good). For making a function const, you should especially look at the "API Change Proposal" section. But you can make the function const in a separate step as well, if you prefer.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

Alrighty, with any luck I should have a working version ready later today, what exactly is the process to merge it? Do i just fork the repo, push my changes to the fork and then create a PR, or are there any additional required steps?

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

And as for the diff size, I'm not sure a lot of the VecDeque code is salvageable, so I would've preferred to just replace it outright, as most of the algorithms and functions would need to be replaced anyways.

I changed my mind about this btw, and am now just modifying the existing VecDeque instead.

@joboet
Copy link
Member Author

joboet commented Oct 12, 2022

Alrighty, with any luck I should have a working version ready later today, what exactly is the process to merge it? Do i just fork the repo, push my changes to the fork and then create a PR, or are there any additional required steps?

Yes, exactly. Perhaps include "r? @scottmcm" in the PR description to assign @scottmcm, I saw on Reddit that they were interested in this.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

Perfect, thank you.

@scottmcm
Copy link
Member

Yup, you can assign it to me.

One procedural thing: I would suggest ensuring your PR is only an implementation change, and doesn't — in that first PR — add any new API guarantees.

Because changing the implementation could always be reverted, and thus we can just do it. But new promises, like "VecVecDeque is always O(1)" is something that would need a libs-api FCP because it's something that we can't just revert, which takes extra oversight and waiting periods. So let's get in the implementation first, then do follow-up PRs to add the new guarantees to be able to talk about them separately.

@Sp00ph
Copy link
Member

Sp00ph commented Oct 12, 2022

Sure thing. The way I've written it now, it says that in its current implementation (the new one), it's a cheap conversion, but this isn't guaranteed and shouldn't be relied upon. Hope it's okay that way.

@scottmcm
Copy link
Member

Yeah, something non-committal is fine.

Like the current one says

This avoids reallocating where possible, but the conditions for that are strict, and subject to change, and so shouldn’t be relied upon unless the Vec<T> came from From<VecDeque<T>> and hasn’t been reallocated.

https://doc.rust-lang.org/std/collections/struct.VecDeque.html#impl-From%3CVec%3CT%2C%20A%3E%3E-for-VecDeque%3CT%2C%20A%3E

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 28, 2022
Update VecDeque implementation to use head+len instead of head+tail

(See rust-lang#99805)

This changes `alloc::collections::VecDeque`'s internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design:
* It allows the buffer to be of length 0, which means the `VecDeque::new` new longer has to allocate and could be changed to a `const fn`
* It allows the `VecDeque` to fill the buffer completely, unlike the old implementation, which always had to leave a free space
* It removes the restriction for the size to be a power of two, allowing it to properly `shrink_to_fit`, unlike the old `VecDeque`
* The above points also combine to allow the `Vec<T> -> VecDeque<T>` conversion to be very cheap and guaranteed O(1). I mention this in the `From<Vec<T>>` impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal.

All the tests seem to pass for the new `VecDeque`, with some slight adjustments.

r? `@scottmcm`
@scottmcm
Copy link
Member

cc the ACP to constify and add conversion guarantees: rust-lang/libs-team#138

@joboet
Copy link
Member Author

joboet commented Dec 23, 2022

As #102991 and #105127 have been merged, I'm closing this as resolved. Thank you @Sp00ph for your work on this!

@joboet joboet closed this as completed Dec 23, 2022
Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023
Update VecDeque implementation to use head+len instead of head+tail

(See rust-lang#99805)

This changes `alloc::collections::VecDeque`'s internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design:
* It allows the buffer to be of length 0, which means the `VecDeque::new` new longer has to allocate and could be changed to a `const fn`
* It allows the `VecDeque` to fill the buffer completely, unlike the old implementation, which always had to leave a free space
* It removes the restriction for the size to be a power of two, allowing it to properly `shrink_to_fit`, unlike the old `VecDeque`
* The above points also combine to allow the `Vec<T> -> VecDeque<T>` conversion to be very cheap and guaranteed O(1). I mention this in the `From<Vec<T>>` impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal.

All the tests seem to pass for the new `VecDeque`, with some slight adjustments.

r? `@scottmcm`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants