-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve VecDeque
implementation
#99805
Comments
Note: That limit is for isize::MAX bytes. You can handle an array of But it seems like |
Since the size is known at compile, ZSTs can be checked for without any performance hit. Also, since their address and order is irrelevant, it should be possible to use just one of the indices as counter for the number of elements, which should make operations faster. |
Let's keep in mind the sequence of implementation bugs that have being found in VecDeque. It's one of those situations where having a formal proof of code is useful. |
It is intended that const capabilities expand to allow expressing this. Redesigning VecDeque around this limitation, if it would affect correctness, doesn't seem necessary. |
The implementation in the blog post does not have any known errors, so correctness is not affected. On the other hand, even if
Definitely, but I do not think that this should block the usage of an objectively better implementation. |
We could even check if the capacity is a power of two and then do masking and otherwise do modular arithmetic. But imo that should be done separately from allowing non-allocating |
Just a point: the solution proposed by Eli Dupree was actually originally proposed by dizzy57 in the comments on the original jsnell article on 2016-12-14. @the8472 I don't think it would make sense to check the capacity and then use one of two methods depending on if it's a power-of-two or not: that's additional branches, more code complexity, etc all to avoid some integer math. |
I would hope those branches branches are well-predicted and cheaper than divisions. |
I wrote this reddit post yesterday, and got some replies saying I should maybe comment on here so here goes nothing: I've done some differential fuzzing, comparing my implementation against the std I've also run the Other than that, is there any chance an implementation similar to / based on my |
VecDeque documentation doesn't promise that it'll size its capacity to a power-of-two so that kind of rewrite should be ok in principle. Other collection impls have been replaced piecemeal or wholesale before (e.g. HashMap). And free conversions are enticing, even if VecDeque is a bit more niche than Vec itself. Since it'll likely touch a lot of code, including some unsafe code, you should be prepared for a long review time though and try to keep the diff as small as reasonable (without doing contortions). Does your deque also work for ZSTs? |
It does work for ZSTs, just like Vec it doesn't allocate for ZSTs, its capacity is always usize::MAX and some of the functions that copy stuff around (make_contiguous, rotate_{right,left} and so on) also skip the copying for ZSTs to minimize the required work. |
That's due to a specialization that eliminates repeated reallocation and bounds-checks. |
And as for the diff size, I'm not sure a lot of the VecDeque code is salvageable, so I would've preferred to just replace it outright, as most of the algorithms and functions would need to be replaced anyways. |
I'm aware of the specialization and have implemented it in a very similar manner myself. |
If I do change |
Yes, that is the correct attribute. I would recommend checking out the std-dev-guide, it might answer a few questions 😉 (I only discovered it a few days ago, and it's pretty good). For making a function |
Alrighty, with any luck I should have a working version ready later today, what exactly is the process to merge it? Do i just fork the repo, push my changes to the fork and then create a PR, or are there any additional required steps? |
I changed my mind about this btw, and am now just modifying the existing VecDeque instead. |
Yes, exactly. Perhaps include "r? @scottmcm" in the PR description to assign @scottmcm, I saw on Reddit that they were interested in this. |
Perfect, thank you. |
Yup, you can assign it to me. One procedural thing: I would suggest ensuring your PR is only an implementation change, and doesn't — in that first PR — add any new API guarantees. Because changing the implementation could always be reverted, and thus we can just do it. But new promises, like " |
Sure thing. The way I've written it now, it says that in its current implementation (the new one), it's a cheap conversion, but this isn't guaranteed and shouldn't be relied upon. Hope it's okay that way. |
Yeah, something non-committal is fine. Like the current one says
|
Update VecDeque implementation to use head+len instead of head+tail (See rust-lang#99805) This changes `alloc::collections::VecDeque`'s internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design: * It allows the buffer to be of length 0, which means the `VecDeque::new` new longer has to allocate and could be changed to a `const fn` * It allows the `VecDeque` to fill the buffer completely, unlike the old implementation, which always had to leave a free space * It removes the restriction for the size to be a power of two, allowing it to properly `shrink_to_fit`, unlike the old `VecDeque` * The above points also combine to allow the `Vec<T> -> VecDeque<T>` conversion to be very cheap and guaranteed O(1). I mention this in the `From<Vec<T>>` impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal. All the tests seem to pass for the new `VecDeque`, with some slight adjustments. r? `@scottmcm`
cc the ACP to constify and add conversion guarantees: rust-lang/libs-team#138 |
Update VecDeque implementation to use head+len instead of head+tail (See rust-lang#99805) This changes `alloc::collections::VecDeque`'s internal representation from using head and tail indices to using a head index and a length field. It has a few advantages over the current design: * It allows the buffer to be of length 0, which means the `VecDeque::new` new longer has to allocate and could be changed to a `const fn` * It allows the `VecDeque` to fill the buffer completely, unlike the old implementation, which always had to leave a free space * It removes the restriction for the size to be a power of two, allowing it to properly `shrink_to_fit`, unlike the old `VecDeque` * The above points also combine to allow the `Vec<T> -> VecDeque<T>` conversion to be very cheap and guaranteed O(1). I mention this in the `From<Vec<T>>` impl, but it's not a strong guarantee just yet, as that would likely need some form of API change proposal. All the tests seem to pass for the new `VecDeque`, with some slight adjustments. r? `@scottmcm`
VecDeque
currently allocates one extra empty element because it needs to discern between an empty and full queue. Besides wasting memory, this meansVecDeque::new
cannot currently beconst
.Solution
The most elegant solution would be to reimplement
VecDeque
based off the (third) technique described by Juho Snellman in this blog post. In this implementation, the buffer indexes are not clamped to the buffer size, but instead use the whole range ofusize
. Only when accessing an element are they masked to fit. The length of the buffer is defined by the wrapping distance between the two indexes. By limiting the capacity to values smaller thanisize::MAX
(which Rust's memory model dictates anyway), an empty queue (withhead == tail
) is strictly different from a full queue (wherehead - tail == capacity
).In the described implementation, the capacity is always be a power of two so that wrapping arithmetic can be used. This is great for performance and simplifies the implementation, but may result in significant unused memory when a large but precise capacity is required. Therefore, Eli Dupree proposed a variation where the indexes are kept smaller than
2 * capacity
using modular arithmetic, which would relax the above requirement but incur higher runtime cost since the more expensive modular arithmetic needs to be used.Links and related work
@rustbot label +T-libs +T-libs-api
The text was updated successfully, but these errors were encountered: