Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for vec_into_raw_parts #65816

Open
2 tasks
shepmaster opened this issue Oct 25, 2019 · 61 comments
Open
2 tasks

Tracking issue for vec_into_raw_parts #65816

shepmaster opened this issue Oct 25, 2019 · 61 comments
Labels
A-collections Area: `std::collection` A-raw-pointers Area: raw pointers, MaybeUninit, NonNull B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@shepmaster
Copy link
Member

shepmaster commented Oct 25, 2019

#65705 adds:

impl String {
    pub fn into_raw_parts(self) -> (*mut u8, usize, usize) {}
}
impl<T> Vec<T> {
    pub fn into_raw_parts(self) -> (*mut T, usize, usize) {}
}

Things to evaluate before stabilization

@shepmaster shepmaster added A-collections Area: `std::collection` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Oct 25, 2019
@jonas-schievink jonas-schievink added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Oct 25, 2019
@SimonSapin
Copy link
Contributor

Should the returned pointer be a NonNull instead? Box::raw doesn’t do that because it was stabilized before NonNull was.

Centril added a commit to Centril/rust that referenced this issue Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
JohnTitor added a commit to JohnTitor/rust that referenced this issue Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
@shepmaster
Copy link
Member Author

Should these functions be associated functions like Box::into_raw? The Box case is required because we want to avoid colliding with methods exposed by Deref / DerefMut. This isn't a concern for String or Vec<T>, but it may be nice to be consistent across this family of similar functions.

@matklad
Copy link
Member

matklad commented Oct 28, 2019

Here's an example where this might have helped with avoiding UB: confio/go-rust-demo#1 (comment). There, the vector was destructed manually and mem::forgetted, however, len was used in place of both length and capacity. If Vec::from_raw_parts was a thing, it might have prevented this issue, as, with cap on your hands, you are more likely to do the right thing with it.

danielhenrymantilla added a commit to danielhenrymantilla/rust that referenced this issue Feb 4, 2020
Updated tracking issue number

Added safeguards for transmute_vec potentially being factored out elsewhere

Clarified comment about avoiding mem::forget

Removed unneeded unstable guard

Added back a stability annotation for CI

Minor documentation improvements

Thanks to @Centril's code review

Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>

Improved layout checks, type annotations and removed unaccurate comment

Removed unnecessary check on array layout

Adapt the stability annotation to the new 1.41 milestone

Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>

Simplify the implementation.

Use `Vec::into_raw_parts` instead of a manual implementation of
`Vec::transmute`.

If `Vec::into_raw_parts` uses `NonNull` instead, then the code here
will need to be adjusted to take it into account (issue rust-lang#65816)

Reduce the whitespace of safety comments
@Lokathor
Copy link
Contributor

If we view this as being the opposite of Vec::from_raw_parts the answers become fairly obvious:

  • The pointer shoud be a plain raw pointer, because that's the type that from_raw_parts takes.
  • The method should be associated so that both visually it matches the style used for from_raw_parts (in a sense), and also because this is a "weird" method that editors don't need to be suggesting with auto-complete.

@SimonSapin
Copy link
Contributor

I think it’s not at all obvious that deliberately making this method "weird" is a good thing.

@Lokathor
Copy link
Contributor

The method is weird either way. "This uses the vec by self, but won't free the memory" is sufficiently out there.

@SimonSapin
Copy link
Contributor

I don’t see how that makes it weird. Vec::into_boxed_slice is the same. It’s pretty normal for conversions to take self by value.

@Lokathor
Copy link
Contributor

But into_boxed_slice doesn't leak the memory

@ghost
Copy link

ghost commented Feb 17, 2020

Well,the documentation does a great job telling you how to avoid a leak and also leaking it is not unsafe.

@Lokathor
Copy link
Contributor

I don't disagree with any of that, but also my opinion is unchanged.

It's not a big deal to have this as associated function vs method, but shepmaster is right that the associated version function makes this meta-family of functions feel consistent across the alloc crate.

But again it's a totally unimportant difference either way, and we should just stabilize either form and get on with the day.

@ghost
Copy link

ghost commented Feb 17, 2020

In my opinion functions that start with into_ should be methods to match the definition of other methods and the Into trait unless can collide with methods on the Deref::Target while the rest can be left as associated functions if it is conventional.

@hsivonen

This comment has been minimized.

@RustyYato

This comment has been minimized.

@shepmaster

This comment has been minimized.

@SimonSapin
Copy link
Contributor

@rust-lang/libs Any thoughts on the unresolved questions in the issue description? I’d be inclined to say:

  • This method should return a NonNull pointer: the conversion to a raw pointer is easy and safe, the reverse in either unsafe or has an unnecessary branch
  • It should not be changed to an associated function: Deref::Target is not to an arbitrary type, so unlike Box there is no risk of shadowing another method.

@BurntSushi
Copy link
Member

@SimonSapin SGTM. The missing parallelism between definitions of into_raw_parts and from_raw_parts is a little unfortunate, but I think NonNull is the right choice for the reasons you stated.

@SimonSapin
Copy link
Contributor

The tuple order is the same as the arguments to Vec::from_raw_parts, but the memory order is both unspecified and not actually that one:

pub struct Vec<T, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global> {
buf: RawVec<T, A>,
len: usize,
}

pub(crate) struct RawVec<T, A: Allocator = Global> {
ptr: Unique<T>,
cap: usize,
alloc: A,
}

@shepmaster
Copy link
Member Author

the three values are given in the order

specifically, they are given in the order that Vec::from_raw_parts expects them.

@Stargateur
Copy link
Contributor

Stargateur commented Aug 8, 2022

I agree but that a circular argument, Vec::from_raw_parts while not using a tuple have the same problem I raise, you could confuse the two and compiler have no way to help you.

Here's an example where this might have helped with avoiding UB: confio/go-rust-demo#1 (comment). There, the vector was destructed manually and mem::forgetted, however, len was used in place of both length and capacity. If Vec::from_raw_parts was a thing, it might have prevented this issue, as, with cap on your hands, you are more likely to do the right thing with it.

For example while saying this method could have help, I also see case where people will easily swap cap and len. I think a named field help to "more likely to do the right thing".

I don't like much tuple in public API.

I’m not commenting to disagree but to offer a mnemonic: the three values are given in the order the things they represent are placed in memory. The first is pointer to the beginning of the region. The second represents the end of the valid slice which is somewhere in the middle between beginning and end of the allocated memory. The third represents the end of allocated memory which comes last.

That a nice mnemonic. Fun fact the doc of vec say "Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet."...

To avocate more my point this struct could be used in all String::into_raw_parts, removing the need of Vec::into_raw_parts_with_alloc (we could add alloc latter no ?):

struct VecRawParts<T, A = Global> {
  pub ptr: *mut T,
  pub len: usize,
  pub cap: usize,
  pub alloc: A,
}

@lopopolo
Copy link
Contributor

lopopolo commented Aug 8, 2022

@Stargateur this is precisely what the raw-parts crate does which is linked earlier in the discussion.

https://crates.io/crates/raw-parts

@Stargateur
Copy link
Contributor

Stargateur commented Aug 8, 2022

@Stargateur this is precisely what the raw-parts crate does which is linked earlier in the discussion.

crates.io/crates/raw-parts

I actually miss it but that doesn't change my point, I don't want to include a dep for something that trivial and that require to know this crate exist. I think we need it in std not elsewhere. We can already have a "into_raw_parts" using capacity(), len() and as_mut_ptr(), so the purpose of into_raw_parts() is to have a nice user friendly way to not make mistake, this mean current into_raw_parts() is almost useless since your crate is actually better IMO.

@Lokathor
Copy link
Contributor

Lokathor commented Aug 8, 2022

I don't think that special types are required, but I do think that changing the first line of the docs would help immensely:

Decomposes a Vec<T> into its raw components: (ptr, len, capacity)

@cmazakas
Copy link

cmazakas commented Aug 9, 2022

If all adding a special type does is prevent confusing len with capacity, I don't think it's worth it. It'd set a bad precedent for the stdlib to opt into these safety types that don't really offer all that much with regards to actual safety. Confusing len and capacity at a call-site will eventually happen but not nearly as often as developers simply getting the values wrong entirely.

@Stargateur
Copy link
Contributor

Stargateur commented Aug 9, 2022

It'd set a bad precedent for the stdlib to opt into these safety types that don't really offer all that much with regards to actual safety.

How that a bad precedent ? there is I think zero method of std that return a tuple of 3 elements. Then Rust is about structural programing, use a struct is THE way to go for Rust. If you want you can check https://doc.rust-lang.org/std/iter/index.html#structs that have like 30 structs. If you want a struct made for both perf and nice user API there is https://doc.rust-lang.org/std/collections/hash_map/enum.Entry.html or https://doc.rust-lang.org/std/fs/struct.OpenOptions.html.

If it's not acceptable to have it I would more be happy with a fonction call into_ptr that consume the vec but only return the ptr this mean user would need to:

let len = vec.len();
let cap = vec.capacity();
let alloc = vec.allocator();
let ptr = vec.into_ptr();

But at least that remove the problem of tuple and remove the problem of duplicate api for allocation feature.

@poliorcetics
Copy link
Contributor

If all adding a special type does is prevent confusing len with capacity

Confusing capacity for len can easily lead to a slice::from_raw_parts(ptr, capacity_as_len) which is completely wrong and very very easy to miss in reviews if not looking at the doc of into_raw_parts all the time

@LegionMammal978
Copy link
Contributor

there is I think zero method of std that return a tuple of 3 elements.

Not quite; the slice API has a few of them:

To be fair, none of those methods have the same length vs. capacity confusion.

Then Rust is about structural programing, use a struct is THE way to go for Rust. If you want you can check https://doc.rust-lang.org/std/iter/index.html#structs that have like 30 structs. If you want a struct made for both perf and nice user API there is https://doc.rust-lang.org/std/collections/hash_map/enum.Entry.html or https://doc.rust-lang.org/std/fs/struct.OpenOptions.html.

Most of these are opaque types designed only to carry trait impls such as Iterator or Error. The remainder of transparent types have special functions: std::alloc::Layout has lots of methods for manipulating repr(C) layouts, std::fs::OpenOptions has several OS-dependent methods, std::cmp::Reverse and std::num::Wrapping modify the inner type's impls, etc. What kind of special API would the raw parts have to justify placing them in their own type?

@Stargateur
Copy link
Contributor

Stargateur commented Aug 9, 2022

@LegionMammal978 Fair would be to say it's just 2, variations doesn't count, align_to and select_nth_unstable are acceptable since there are naturally ordered, like split_at that return (prefix, sufix). Thus I would say my point remain it's pretty RARE and for a good reason, tuple doesn't express anything. It's nice for closure or specific use case but public API should avoid return tuple.

Most of these are opaque types designed only to carry trait impls such as Iterator or Error. The remainder of transparent types have special functions: std::alloc::Layout has lots of methods for manipulating repr(C) layouts, std::fs::OpenOptions has several OS-dependent methods, std::cmp::Reverse and std::num::Wrapping modify the inner type's impls, etc.

My first point was to say add a struct is common. I don't understand the rest of your point look like HS.

What kind of special API would the raw parts have to justify placing them in their own type?

The crate raw-parts clearly show advantage, have a "Raw builder" API for Vec allow to help user avoid mistake using from_vec, better this allow to add a nice into_vec that also remove the potentially mistake of from_raw_parts argument order. (thus raw-parts choice to use associate method for into_vec is weird to me)

This mean a user can get the struct, mutate only what needed, and use it to reconstruct the vec. That a BIG plus when the feature is mean to use unsafe to have every small help you can.

@anayw2001
Copy link

Hi everyone, just wanted to ask what the status on this issue was, and if the agreed-upon suggestion of this thread was to use the raw-parts crate for stable rust toolchains.

@christopinka
Copy link

upon suggestion of this thread was to use the raw-parts crate for stable rust toolchains.

Seems to work. Don't know what the tradeoffs are.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented May 9, 2023

In stable Rust, you can safely decompose a Vec without any extra crates by using ManuallyDrop:

use std::mem::ManuallyDrop;

pub fn into_raw_parts<T>(vec: Vec<T>) -> (*mut T, usize, usize) {
    let mut vec = ManuallyDrop::new(vec);
    let length = vec.len();
    let capacity = vec.capacity();
    (vec.as_mut_ptr(), length, capacity)
}

The main downside (aside from verbosity) is that you have to be very careful: after calling vec.as_mut_ptr(), you cannot touch vec in any way, even to pass it to mem::forget(). As a corollary, you must call vec.len() and vec.capacity() before calling vec.as_mut_ptr(). (This is a safe function, so breaking this rule will not immediately cause UB, but any later unsafe code which accesses the pointer or calls Vec::from_raw_parts() will invariably cause UB. Note that this may change to no longer be UB in a future version of the language.)

Removing the need for this incantation is the purpose of the proposed Vec::into_raw_parts() API.

@cmazakas
Copy link

cmazakas commented May 9, 2023

I'm sorry, what? UB? Can't touch? Discussing such things as UB or not-UB is out of scope for the issue.

@mina86
Copy link
Contributor

mina86 commented May 9, 2023

after calling vec.as_mut_ptr(), you cannot touch vec in any way,

as_mut_ptr returns a pointer which doesn’t count as an exclusive borrow so you can continue using vec through a shared reference. It’s only using vec through exclusive reference that may lead to pointer being invalidated (and subsequent undefined behaviour when trying to dereference it).

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented May 9, 2023

I'm sorry, what? UB? Can't touch? Discussing such things as UB or not-UB is out of scope for the issue.

That into_raw_parts() function is what this feature intends to replace. Vec::into_raw_parts() is equivalent to mem::forget() in safe code; its only utility lies in unsafe code which accesses the pointer or calls Vec::from_raw_parts(), and into_raw_parts() must be written very carefully for that unsafe code to work properly.

after calling vec.as_mut_ptr(), you cannot touch vec in any way,

as_mut_ptr returns a pointer which doesn’t count as an exclusive borrow so you can continue using vec through a shared reference. It’s only using vec through exclusive reference that may lead to pointer being invalidated (and subsequent undefined behaviour when trying to dereference it).

The signature of Vec::as_mut_ptr() is pub fn as_mut_ptr(&mut self) -> *mut T. Note the &mut self, which is the problem here, since the *mut T pointer is derived from that &mut self reference and will be invalidated once &mut self is invalidated, if we decide to use noalias Vec in the standard library. Looking a bit closer, it might not be a problem in the particular case of calling len() or capacity(), but it's still very fragile, and would cause issues if we were to call as_mut_ptr() again, or mem::forget(). See #54470, #94421, and rust-lang/unsafe-code-guidelines#326 for more information.

@mina86
Copy link
Contributor

mina86 commented May 9, 2023

I don’t quite see how what you wrote invalidates what I wrote. So long as you don’t modify vector after taking the pointer, you can do pretty much whatever you want with the vector and the pointer remains valid.

And to put it bluntly, if the following:

let mut vec = vec![1];
let ptr = vec.as_mut_ptr();
let len = vec.size();
core::mem::forget(vec);
println!("{}", unsafe { *ptr });

is undefined behaviour than that’s the language defect that needs to be addressed regardless of into_raw_parts.

@cmazakas
Copy link

cmazakas commented May 9, 2023

Yes, I agree. The simple solution is for the stdlib to prevent the returned pointer from carrying noalias.

However, this is outside the scope of the issue because into_raw_parts() and from_raw_parts will continue to work correctly, regardless of what the stdlib decides to do.

Fwiw, Legion is likely alluding to https://llvm.org/docs/LangRef.html#noalias when applied to function return types. Yielding a noalias raw pointer from as_mut_ptr() means that all accesses must be done through the returned pointer or one derived from it.

But again, this isn't what's currently happening today. And it also doesn't really matter for this particular API because the usage is largely the same.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented May 9, 2023

I don’t quite see how what you wrote invalidates what I wrote. So long as you don’t modify vector after taking the pointer, you can do pretty much whatever you want with the vector and the pointer remains valid.

This is false regardless of Vec's aliasing guarantees. For instance, accessing the pointer, even just to read it, will invalidate any exclusive references to its contents, even if those references are never written to (as can be seen with Miri on the Playground):

fn main() {
    let mut vec = vec![1];
    let ptr = vec.as_mut_ptr();
    let slice = vec.as_mut_slice();
    println!("{}", unsafe { *ptr });
    let _slice = slice; // UB
}

Forming and reborrowing references is always a potentially dangerous operation under Stacked Borrows, when they are mixed with raw pointers.

And to put it bluntly, if the following: [...] is undefined behaviour than that’s the language defect that needs to be addressed regardless of into_raw_parts.

Then take it up with the UCG WG in rust-lang/unsafe-code-guidelines#326 or on Zulip.

However, this is outside the scope of the issue because into_raw_parts() and from_raw_parts will continue to work correctly, regardless of what the stdlib decides to do.

Sure, it's just that someone was asking how to perform the equivalent of Vec::into_raw_parts() in stable Rust, so I gave the solution, and I warned of some pitfalls one might run into while trying to refactor that solution or adapt it to another situation.

@mina86
Copy link
Contributor

mina86 commented May 9, 2023

I don’t quite see how what you wrote invalidates what I wrote. So long as you don’t modify vector after taking the pointer, you can do pretty much whatever you want with the vector and the pointer remains valid.

This is false regardless of Vec's aliasing guarantees. For instance, accessing the pointer, even just to read it, will invalidate any exclusive references to its contents, even if those references are never written to (as can be seen with Miri on the Playground):

fn main() {
    let mut vec = vec![1];
    let ptr = vec.as_mut_ptr();
    let slice = vec.as_mut_slice();
    println!("{}", unsafe { *ptr });
    let _slice = slice; // UB
}

This is UB because it creates two mutable references to the same object: one through slice and the other through *ptr. The issue isn’t that vec is touched after calling as_mut_ptr.

Forming and reborrowing references is always a potentially dangerous operation under Stacked Borrows, when they are mixed with raw pointers.

Which doesn’t change the fact that after calling vec.as_mut_ptr() you can touch vec in various ways including passing it to core::mem::forget(). Your assessment that you cannot call len or capacity after calling as_mut_ptr is incorrect as far as I can tell and Miri seems to agree.

And to put it bluntly, if the following: [...] is undefined behaviour than that’s the language defect that needs to be addressed regardless of into_raw_parts.

Then take it up with the UCG WG in rust-lang/unsafe-code-guidelines#326 or on Zulip.

Well, Miri doesn’t complain about the code so I don’t think I need to bring it up since the code appears to be sound.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented May 10, 2023

Well, Miri doesn’t complain about the code so I don’t think I need to bring it up since the code appears to be sound.

That's because the current implementation does not, in fact, implement Vec using Box. But this is not a stable behavior; there have been no promises made yet regarding the aliasing properties of Vec. Miri can only detect immediate language-level UB like invalid pointer accesses, not violations of library rules nor dependency on unstable behavior.

In fact, PR #94421 experimented with making this exact change; unfortunately, it cannot be demonstrated directly, since Miri at the time of the PR did not yet support recursively retagging references in struct fields. But an equivalent issue can be demonstrated on current nightly with a Box<[T]> (Rust Playground), which is roughly what a noalias Vec would contain:

fn main() {
    let mut boxed_slice = vec![1].into_boxed_slice();
    let ptr = boxed_slice.as_mut_ptr();
    std::mem::forget(boxed_slice);
    println!("{}", unsafe { *ptr }); // UB
}
error: Undefined Behavior: attempting a read access using <3158> at alloc1501[0x0], but that tag does not exist in the borrow stack for this location
 --> src/main.rs:5:29
  |
5 |     println!("{}", unsafe { *ptr }); // UB
  |                             ^^^^
  |                             |
  |                             attempting a read access using <3158> at alloc1501[0x0], but that tag does not exist in the borrow stack for this location
  |                             this error occurs as part of an access at alloc1501[0x0..0x4]
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3158> was created by a SharedReadWrite retag at offsets [0x0..0x4]
 --> src/main.rs:3:15
  |
3 |     let ptr = boxed_slice.as_mut_ptr();
  |               ^^^^^^^^^^^^^^^^^^^^^^^^
help: <3158> was later invalidated at offsets [0x0..0x4] by a Unique retag
 --> src/main.rs:4:22
  |
4 |     std::mem::forget(boxed_slice);
  |                      ^^^^^^^^^^^
  = note: BACKTRACE (of the first span):
  = note: inside `main` at src/main.rs:5:29: 5:33

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

@mina86
Copy link
Contributor

mina86 commented May 10, 2023

I stand corrected (and baffled at the same time). Thanks.

@cmazakas
Copy link

The clear solution is for Vec to never be noalias because otherwise it's terrible.

But this is the tracking issue for into_raw_parts.

The simple solution is this, if you genuinely have trouble remembering it's length then capacity, just write your own helper. There's no need to pollute the stdlib API itself because a caller can't remember off-hand that it's len-cap, not cap-len.

Now, let's stabilize the feature already.

@ChrisDenton
Copy link
Member

The discussion above is an argument for stabilizing into_raw_parts. It demonstrates that the function is useful above and beyond doing it manually. And without such a justification the case for stabilizing this would be weaker.

@cmazakas
Copy link

The discussion above is an argument for stabilizing into_raw_parts. It demonstrates that the function is useful above and beyond doing it manually. And without such a justification the case for stabilizing this would be weaker.

Whatever it takes to get it stabilized. Now, I need to help the unsafe WG realize they shouldn't ruin Vec like Box.

@tae-soo-kim
Copy link
Contributor

Should the returned pointer be a NonNull instead? Box::raw doesn’t do that because it was stabilized before NonNull was.

No one has ever mentioned NonNull is covariant. There are actually 2 differences between NonNull and *mut: non-null and covariance. I'm not inclined to the current design of NonNull since it loses orthogonality.

That being said, I think it is better to return a non-null pointer in this case. It's just that this should be a RawNonNull which is exactly the same as *mut except being non-null. I don't like the side-effect of making the pointer covariant (as the current NonNull would).

@Jan561
Copy link

Jan561 commented Feb 6, 2024

Why would covariance be an issue here? The returned pointer is unique and represents ownership, in which case covariance should be fine, maybe even desirable. (with the very limited understanding I have on this topic, so take this with a grain of salt)

And I think this is one of the (two) usecases NonNull was originally designed for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` A-raw-pointers Area: raw pointers, MaybeUninit, NonNull B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests