Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Vec::recycle #2802

Closed
wants to merge 8 commits into from
Closed

RFC: Vec::recycle #2802

wants to merge 8 commits into from

Conversation

golddranks
Copy link

@golddranks golddranks commented Nov 3, 2019

Add method Vec::recycle that allows safe reuse of the backing allocation of the Vec, helping avoiding allocations and de-allocations in tight loops.

let mut objects: Vec<Object<'static>> = Vec::new();      // Any lifetime will do here
while let Some(byte_chunk) = stream.next() {             // `byte_chunk` lifetime starts
    // Consumes `objects`, creating a new `Vec` `temp`.
    let mut temp: Vec<Object<'_>> = objects.recycle();   // `temp` lifetime starts

    // Zero-copy parsing; `Object`s has references to `byte_chunk`
    deserialize(byte_chunk, &mut temp)?;
    process(&temp)?;

    // "Returns" `object` back to where it was.
    objects = temp.recycle();                            // `temp` lifetime ends
}                                                        // `byte_chunk` lifetime ends

Rendered

@RustyYato
Copy link

RustyYato commented Nov 3, 2019

seems similar to #2756

You can use my vec-utils crate to do this,

use vec_utils::VecExt;

let mut objects: Vec<Object> = Vec::new();                  // Any lifetime will do here

while let Some(byte_chunk) = stream.next() {                // `byte_chunk` lifetime starts
    // Consumes `objects`, creating a new `Vec` `temp`.
    let mut temp: Vec<Object> = objects.drop_and_reuse();   // `temp` lifetime starts
    // let mut temp = objects.drop_and_reuse::<Object>();   // or if you prefer

    // Zero-copy parsing; `Object`s has references to `byte_chunk`
    deserialize(byte_chunk, &mut temp)?;
    process(&temp)?;

    // "Returns" `object` back to where it was.
    objects = temp.drop_and_reuse();                        // `temp` lifetime ends
}                                                           // `byte_chunk` lifetime ends

@golddranks
Copy link
Author

golddranks commented Nov 3, 2019

seems similar to #2756

The difference with transmuting vectors is that this pattern doesn't transmute or reinterpret any arbitrary values, and the RCFs argues it's a safe pattern.

You can use my vec-utils crate to do this,

I didn't know about vec-utils, thanks for linking it. I'll add it to the prior work tomorrow. I also prepared the API as a crate as a companion to the RFC:
https://crates.io/crates/recycle_vec

(Ninja edits for precise quotes)

@RustyYato
Copy link

One difference between our apis is that I just cteate a new Vec if layouts don't match. This allows it to be seemlessly used in generic contexts.

@golddranks
Copy link
Author

golddranks commented Nov 3, 2019

That seems to be a good pattern. There is possibly room for two APIs for both behaviours. On the other hand, if we some day get compile time asserts, I'd like for it to fail in cases where I thought it wouldn't allocate but the layout doesn't match and it would be forced to.

@SimonSapin
Copy link
Contributor

Is this the same as Vec::clear + Vec::transmute (which itself is Vec::into_raw_parts + as + Vec::from_raw_parts)?

@RustyYato
Copy link

@SimonSapin yes, this is the same as that

@golddranks
Copy link
Author

I updated the implementation to reflect the changes I did here: rust-lang/rust#66069 This includes using Vec::into_raw_parts and Vec::clear, and rephrasing the summary.

Some things I mentioned as unresolved questions are becoming clearer:

  • There isn't currently planned or intended support for const items to refer generic type parameters in scope. This is needed for the compile time size and alignment assertion, so even if compile time assertions stabilized, this API couldn't utilize them.
  • Already mentioned in the RFC, but my concern about compatibility with speculative future allocator APIs seems to be unfounded, because the standard library already contains APIs that reinterpret an owned type and thus deallocate it as a different type than it was allocated as.

@Lokathor
Copy link
Contributor

Lokathor commented Nov 6, 2019

This is Vec::clear then Vec::transmute with no bound because you can't witness the previous type in a bad state? that seems neat, but probably also not RFC required, since it's just one method.

@golddranks
Copy link
Author

@Lokathor: The rule whether an RFC is required isn't all clear to me, so I decided to write one if not just to spell out all the considerations. There is also an open PR.

@Lokathor
Copy link
Contributor

Lokathor commented Nov 7, 2019

An RFC doesn't hurt!

Usually they're reserved for major changes to the standard library, changes to how cargo/rustc build, and changes to the language itself (eg: adding an attribute).

I'm always happy to have folks be aware of allocations though!

@@ -205,6 +205,13 @@ directly subtyping relations between two generic types in where clauses. [1]
Thus, it was deemed to be sufficient to limit verification of the memory layout to the checking
of size and alignment of the stored type.

There is an alternative to this API, provided by https://crates.io/crates/vec_utils,
that instead of panicking on mismatching size or alignment, just allocates a new `Vec`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec-utils does not allocate a new Vec, it just returns Vec::new() if the layouts are incompatible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed, Vec::new() doesn't allocate.

@the8472
Copy link
Member

the8472 commented Nov 9, 2019

There isn't currently planned or intended support for const items to refer generic type parameters in scope. This is needed for the compile time size and alignment assertion, so even if compile time assertions stabilized, this API couldn't utilize them.

Syntactically and semantically this is already possible. Alas, rust-lang/rust#62645 means it doesn't actually work.

@Valloric
Copy link

Just wanted to drop a note that I think Vec::recycle is a fantastic idea. It removes yet another thing people would use unsafe for and it's 100x simpler than dealing with custom allocators (and covers a non-trivial amount of use-cases there). @golddranks Well done!

@jonas-schievink jonas-schievink added A-collections Proposals about collection APIs T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Nov 22, 2019
@golddranks
Copy link
Author

Just to drop a note; I am going to update this RFC to incorporate the current state of discussion the next weekend.

@golddranks
Copy link
Author

An update about checking the invariants: it seems that recent advances in const features have unlocked the type size/alignment checking at compile time: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=cd871dc7385e7a1d8a40216cb27f778b

Now, I wish we had a rough understanding what's the timeline for the features const_fn, const_if_match and const_panic to stabilise.

@the8472
Copy link
Member

the8472 commented Dec 8, 2019

Being able to constrain a where clause by a const bool is still an open issue.

@RustyYato
Copy link

RustyYato commented Dec 8, 2019

@the8472 where clauses are not necessary!

std::mem::transmute does not need to be an intrinsic anymore! We can do this in userland!

As a disclaimer: I do not recommend that we do this! This will lead to C++ level of bad error reporting. We need where clauses to get good error reporting, or some compiler intrinsics/extensions which will allow us to create good error reports. For example, const std::fmt, this would allow for good error reports. But I still think that where clauses are the best way to report errors, because they make these layout constraints explicit.

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=a85938d6c3e6823ab0c99384389e6f36

#![feature(const_generics)]
#![feature(const_fn)]
#![feature(const_if_match)]
#![feature(const_panic)]

#![allow(unused)]

pub mod layout_checker {
    use std::marker::PhantomData;
    
    pub struct LayoutToken<T, U, const WORK: ()>(PhantomData<(T, U)>);
    
    pub const fn check<T, U>() -> LayoutToken<
        T, U,
        {
            if core::mem::size_of::<T>() != core::mem::size_of::<U>()
            || core::mem::align_of::<T>() != core::mem::align_of::<U>() {
                panic!("The types have different layouts");
            }
        }
    > {
        LayoutToken(PhantomData)
    }
}

unsafe fn transmute<T, U>(value: T) -> U {
    // this will panic at compile time post-monomorphization if layouts are incompatible
    layout_checker::check::<T, U>();
    
    std::mem::transmute_copy(&std::mem::ManuallyDrop::new(value) as &T)
}

fn main() {
    let bar = unsafe { transmute::<_, u32>(10_i32) };
    
    // let bar = unsafe { transmute::<_, u8>(10_i32) }; // fails to compile!
}

@the8472
Copy link
Member

the8472 commented Dec 8, 2019

yeah, I supposed that's sufficient. I was thinking of having it spelled it in the recycle signature itself

impl Vec<T> {
  fn recycle<U>(self) -> Vec<U>
    where Is<{size_of::<T>() == size_of::<U>() && align_of::<T>() == align_of::<U>()}>: True
  { ... }
}

@RustyYato
Copy link

@the8472 Yes, that would be much nicer

@SimonSapin
Copy link
Contributor

std::mem::transmute does not need to be an intrinsic anymore! We can do this in userland!

That’s not quite the same. std::mem::transmute’s “magic” happens as soon as a concrete function is instantiated. It doesn’t need to be called:

fn _static_assert_size() {
    let _ = std::mem::transmute::<Foo, [u8; 24]>;
}

@Lokathor
Copy link
Contributor

Lokathor commented Dec 9, 2019

from what ive seen it doesn't work with generics though.

@RustyYato
Copy link

RustyYato commented Dec 9, 2019

@SimonSapin, I completely forgot about that behaviour!

@Lokathor that's because it asserts it's layout concerns before monomorphozation, but to work with generics, it should assert that afterwards.

@the8472
Copy link
Member

the8472 commented Jan 24, 2020

I think recycle(self: Vec<T>) -> Vec<U> combined with panics is the wrong approach due to size/alignment changes becoming breaking changes. I don't think any other safe API in the standard library is sensitive to size or alignment changes.

I think recycle(self: Vec<T>) -> Result<Vec<U>, Vec<T>> would be a better approach since it would leave the choice of how to handle the situation to the caller and makes it explicit that this operation can fail, e.g. due to changing types pulled in from dependencies.

@CryZe
Copy link

CryZe commented Jan 24, 2020

I think recycle(self: Vec<T>) -> Result<Vec<U>, Vec<T>> would be a better approach

Yeah I like this because then you could do my_vec.recycle().unwrap_or_default() and it would work regardless of the type.

@golddranks
Copy link
Author

@the8472 I agree that panics aren't good. Here's my current thinking about the situation:

  1. Static guarantees would be ideal. While Result would be better than panic in the sense that it gives more control to the developer, it isn't ideal either, as it detects error conditions dynamically, while they would be in principle achievable statically. Currently, static errors aren't possible on stable, though. Result also feels like the wrong tool for the job. It's meant for handling expected runtime errors that are handleable. Indeed, this is handleable by allocating a new Vec, but it doesn't feel like a "runtime error".
  2. The recent improvements in const generics have shown that there is hope to stabilize const generic based asserts within the next few years.
  3. My relatively approving stance towards panics is based on the fact that one could strengthen this API backwards compatibly by implementing const generic assertion warnings once that becomes possible. I'm relatively sure that once the machinery is there, custom warnings are going to be a thing. That would allow backward compatible static "almost-guarantees" about the usage. I believe this point wasn't made before in this thread.
  4. @yoshuawuyts helpfully pointed out (private conversation) that implementations in stdlib aren't necessarily limited by "what's on stable". It would be, in principle, possible to implement this API in std with static asserts even if they haven't stabilized yet. Of course, I bet that would need an agreement from the lang team too, as it would somewhat set in stone some future capabilities of Rust. I think this possibility was also voiced for the first time in this thread.

I'm sorry I haven't had the time to reflect the discussion to the RCF text. However, I refrain doing so right now as there are new discussion points.

@the8472
Copy link
Member

the8472 commented Jan 24, 2020

At least the performance aspect of compile-time vs. runtime branching is orthogonal to using Result. size_of and align_of are const and thus recycle could directly compile to one of its branches.

Warnings also are an orthogonal concern, if the caller wants warnings they could use something like vec.recycle().unwrap_or_else(|_| { static_warn("U and V don't match!"); Vec::new()})

Recycle is an optimization, imo neither panics nor compile errors are the right price to pay for optimizations and difficult to silence warnings from stdlib aren't great either.

Leaving most of this to the caller and mentioning those approaches in the documation seems the right approach to me. After all it's mostly a method to avoid people reaching for unsafe, it should do no more than that. The rest is not necessary to remove unsafety.

@blankname
Copy link

Would it make sense to have both a recycle and try_recycle?

recycle would have compile-time guarantees and would be used when someone knows the types involved and wants to be notified at compile-time when they change in a way that makes them incompatible to recycling.

If I'm using recycling and updating a dependency causes recycling to no longer be used, I'd much rather find out about it right when I compile instead of when I hit the particular code path.

try_recycle would return a result and be useful for more generic contexts as a non-essential optimization.

try_recycle could also be implemented first, while we wait for const stuff to bake.

@the8472 Does static_warn currently exist or is it hypothetical?

@the8472
Copy link
Member

the8472 commented Jan 25, 2020

If I'm using recycling and updating a dependency causes recycling to no longer be used, I'd much rather find out about it right when I compile instead of when I hit the particular code path.

That may be fine if you are the author of a bin crate. But you couldn't uphold your semver responsibilities this way if you're a lib crate author unless you pinned the involved dependencies to a single version.

Does static_warn currently exist or is it hypothetical?

Not in stable. In nightly you could build something like that with proc macro diagnostics.

Would it make sense to have both a recycle and try_recycle?

Once you have try_recycle you can build the other one from it. And imo it's not such a common use case where a convenience wrapper would be worth it in the stdlib.

@blankname
Copy link

@the8472
Forgive me if I'm being obtuse, but in your example wouldn't the static_warn message still only appear at run-time?

The Err case would occur when recycle was actually called, not during type checking.

If recycle was const and being used in a const context you'd get the warning, but that would be rare (the optimization is less likely to be needed when constructing compile-time values).

I'm not very familiar with proc macro diagnostics, so my analysis may be incorrect.

@the8472
Copy link
Member

the8472 commented Jan 25, 2020

Ah yeah, you're right. I thought that could could also be used in a const eval context so one could have a static warn with a condition (not in the unwrap lambda) similar to a static assert, but that doesn't actually work.

But I think the standard library has no internal capabilities for emitting compile time warnings either. So the only other options than returning a result are either panics or compile time errors. And both can be done on the caller side.

@golddranks
Copy link
Author

This is a relevant development: "Allow if and match in constants" is going to be stabilised in 1.45: rust-lang/rust#49146

@burdges
Copy link

burdges commented May 18, 2020

You could impose a weaker bound than equal sizes really. Alignments could shrink too, right?

@golddranks
Copy link
Author

golddranks commented May 18, 2020

@burdges I played with that thought too but I wonder what the allocator API guarantees have to say about returning a piece of memory with different layout than what it was originally allocated with.

@SimonSapin
Copy link
Contributor

Alignments could shrink too, right?

For slices, yes. For Vec however, deallocating or reallocating requires passing the same alignment as the initial allocation.

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020
@the8472
Copy link
Member

the8472 commented Oct 2, 2020

Note that with recent iterator optimizations you can get recycle behavior automatically and not only in the transmute (preserve all elements) or recycle (discard all elements) cases but any other cases that process some of the elements.

https://rust.godbolt.org/z/K4jMdc

@programmerjake
Copy link
Member

programmerjake commented Jun 8, 2021

This could use the traits added by safe-transmute to bound what can be recycled, allowing changing the run-time panic to a compile-time error.

https://github.com/rust-lang/project-safe-transmute/blob/master/rfcs/0000-ext-layout-traits.md

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api meeting.

Our rough consensus:

  • We're generally in favor of having methods that help people reduce or eliminate unsafe code.
  • There's already an implementation in the vec-utils crate, and there's no major advantage to implementing this in the standard library, other than more widespread availability.
  • We'd be much more in favor of this if it could do more compile-time checking, and it does seem likely that either const generics or a special-case of safe-transmute would enable that. We're concerned that with runtime-only checking, this may lead to unexpected panics that only occur on some platforms or some allocators.
  • Based on that, we'd like to decline this method for now, but we'd love to see a new PR once it's possible to add more compile-time safety.

@golddranks
Copy link
Author

golddranks commented Jun 24, 2021

Thanks for the update. I'm delighted to find I agree with the Libs API team consensus. I've been sitting on this RFC, waiting for further const features to stabilize that would enable more compile-time checking, so it's been in a kind of a limbo. Do you think I should close this now and re-open when it's implementable? In that case, what's the current preferred process, sending an RFC PR or just an implementation PR against rust-lang/rust? (For what it's worth, I originally opened this as an RFC specifically considering that the implementation required unsafe.)

@m-ou-se
Copy link
Member

m-ou-se commented Jun 30, 2021

Do you think I should close this now and re-open when it's implementable?

Sure. Or maybe just open a new RFC PR instead at that point, linking to this one.

or just an implementation PR against rust-lang/rust?

That's fine too. Just be aware that we might still ask for an RFC if the additions turns out to be controversial from an API perspective.

@m-ou-se
Copy link
Member

m-ou-se commented Jun 30, 2021

Note that .collect() is specialized for Vec::into_iter() (including map, filter, etc.) to re-use the original vector if the alignment matches. So, you can already do:

fn recycle(mut v: Vec<A>) -> Vec<B> {
    v.clear();
    v.into_iter().map(|_| unreachable!()).collect()
}

This will be a 'no-op' if the layout matches. Otherwise, it'll just deallocate the vector and return a new empty one instead.

@golddranks
Copy link
Author

Thanks for the comments, closing this for now! I'm gonna follow up when the time is ripe.

@esemeniuc
Copy link

Would love to see this tried again now that rust is more mature

@golddranks
Copy link
Author

golddranks commented Mar 16, 2024

I just released a new version of the recycle_vec crate. https://github.com/golddranks/recycle_vec The new version manages to check the size and aligment requirements with a static check, using a trick proposed here: golddranks/recycle_vec#2

As @m-ou-se remarked, the pattern seems to be already to be possible with collect. However, there might be some value in having this method for static checks and wider reachability of the functionality, as @joshtriplett said here.

I'll send another PR and see how it's going to be received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.